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Abstract 

A high-performance shooting algorithm is developed to compute time-periodic solutions of the free-surface Euler 
equations with spectral accuracy in double and quadruple precision. The method is used to study resonance and its 
effect on standing water waves. We identify new nucleation mechanisms in which isolated large-amplitude solutions, 
and closed loops of such solutions, suddenly exist for depths below a critical threshold. We also study degenerate and 
secondary bifurcations related to Wilton's ripples in the traveling case, and explore the breakdown of self-similarity 
at the crests of extreme standing waves. In shallow water, we find that standing waves take the form of counter- 
propagating solitary waves that repeatedly collide quasi-elastically. In deep water with surface tension, we find that 
standing waves resemble counter-propagating depression waves. We also discuss existence and non-uniqueness of 
solutions, and smooth versus erratic dependence of Fourier modes on wave amplitude and fluid depth. 

In the numerical method, robustness is achieved by posing the problem as an overdetermined nonlinear system and 
using either adjoint-based minimization techniques or a quadratically convergent trust-region method to minimize the 
objective function. Efficiency is achieved in the trust-region approach by parallelizing the Jacobian computation so the 
setup cost of computing the Dirichlet-to-Neumann operator in the variational equation is not repeated for each column. 
Updates of the Jacobian are also delayed until the previous Jacobian ceases to be useful. Accuracy is maintained using 
spectral collocation with optional mesh refinement in space, a high order Runge-Kutta or spectral deferred correction 
method in time, and quadruple-precision for improved navigation of delicate regions of parameter space as well as 
validation of double-precision results. Implementation issues for GPU acceleration are briefly discussed, and the 
performance of the algorithm is tested for a number of hardware configurations. 

Keywords: water waves, standing waves, resonance, bifurcation, Wilton's ripples, trust-region shooting method, 
boundary integral method, spectral deferred correction, GPU acceleration, quadruple precision 
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1. Introduction 



Time-periodic solutions of the free-surface Euler equations serve as an excellent benchmark for the design and 
implementation of numerical algorithms for two-point boundary value problems governed by nonlinear partial differ- 
ential equations. In particular, there is a large body of existing work on numerical methods for computing standing 
waves |Tl|2][3]|4]|5j|6]|7][8l and short-crested waves ||9l [ISl [HI [I2l for performance comparison. Moreover, many of 
these previous studies reach contradictory scientific conclusions that warrant further investigation, especially concern- 
ing extreme waves and the formation of a corner or cusp. Penney and Price lfT3l predicted a 90 degree corner, which 
was verified experimentally by G. I. Taylor [T4], who was nevertheless skeptical of their analysis. Grant |15| and 
Okamura I.16J gave theoretical arguments supporting the 90 degree comer Schwartz and Whitney [1] and Okamura 
|[8l performed numerical experiments that backed the 90 degree conjecture. Mercer and Roberts ||2l predicted a some- 
what sharper angle and mentioned 60 degrees as a possibility. Schultz et al. |7 | obtained results similar to Mercer and 
Roberts, and proposed that a cusp may actually form rather than a corner. Wilkening [171 showed that extreme waves 
do not approach a limiting wave at all due to fine scale structure that emerges at the surface of very large amplitude 
waves and prevents the wave crest from sharpening in a self-similar manner. This raises many new questions about 



the behavior of large-amplitude standing waves, which we will explore in Section 4.4 



On the theoretical side, it has long been known IfTSl [T9l l20l that standing water waves suffer from a small-divisor 
problem that obstructs convergence of the perturbation expansions developed by Rayleigh |21|, Penney and Price 
IJ3 I. Tadjbakhsh and Keller li22J . Concus [23 J . Schwartz and Whitney [JJ, and others. Penney and Price |, 1 3 1 went 
so far as to state, "there seems Uttle Ukelihood that a proof of the existence of the stationary waves will ever be 
given." Remarkably, Plotnikov and Toland fS?), together with looss f20l, have recently established existence of small- 
amplitude standing waves using a Nash-Moser iteration. As often happens in small-divisor problems [25 .26] , solutions 
could only be proved to exist for values of an amplitude parameter in a totally disconnected Cantor set. No assertion is 
made about parameter values outside of this set. This raises intriguing new questions about whether resonance really 
causes a complete loss of smoothness in the dependence of solutions on amplitude, or if these results are an artifact of 
the use of Nash-Moser theory to prove existence. While a complete answer can only come through further analysis, 
insight can be gained by studying high precision numerical solutions. 

In previous numerical studies, the most effective methods for computing standing water waves have been Fourier 
collocation in space and time ll27l l5l ISl l28l l29l . semi-analytic series expansions ||T1|30|, and shooting methods ^ 
[3]|2!'4l. In Fourier collocation, time-periodicity is built into the basis, and the equations of motion are imposed at 
collocation points to obtain a large nonlinear system of equations. This is the usual approach taken in analysis to prove 
existence of time-periodic solutions, e.g. of nonlinear wave equations [25] or nonlinear Schrodinger equations ll26l . 
The drawback as a numerical method is that the number of unknowns in the nonlinear system grows like (AxAf)"' 
rather than Ajc ' for a shooting method, which limits the resolution one can achieve. Orthogonal collocation, as 
implemented in the software package AUTO ||3T1 . would be less efficient than Fourier collocation as more timesteps 
will be required to achieve the same accuracy. 

The semi-analytic series expansions of Schwartz and Whitney ||Tl[30l are a significant improvement over previous 
perturbation methods ||2T1 [131 l22l l23l in that the authors show how to compute an arbitrary number of terms rather 
than stopping at 3rd or 5th order. They also used conformal mapping to flatten the boundary, which leads to a more 
promising representation of the solution of Laplace's equation. As a numerical method, the coefficients of the expan- 
sion are expensive to compute, which Umits the number of terms one can obtain in practice. (Schwartz and Whitney 
stopped at 25th order). It may also be that the resulting series is an asymptotic series rather than a convergent series. 
Nevertheless, these series expansions play an essential role in the proof of existence of standing waves on deep water 
by Plotnikov, Toland and looss EOll . 

In a shooting method, one augments the known boundary values at one endpoint with additional prescribed data 
to make the initial value problem well posed, then looks for values of the new data to satisfy the boundary conditions 
at the other endpoint. For ordinary differential equations, this normally leads to a system of equations with the same 
number of equations as unknowns. The same is true of multi-shooting methods l32l l33l [34l . When the boundary 
value problem is governed by a system of partial differential equations, it is customary to discretize the PDF to 
obtain an ODE, then proceed as described above. However, because of aliasing errors, quadrature errors, filtering 
errors, and amplification by the derivative operator, discretization causes larger errors in high-frequency modes than 
low-frequency modes when the solution is evolved in time. These errors can cause the shooting method to be too 
aggressive in its search for initial conditions, and to explore regions of parameter space (the space of initial conditions) 
where either the numerical solution is inaccurate, or the physical solution becomes singular before reaching the other 
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endpoint. Even if safeguards are put in place to penalize high-frequency modes in the search for initial conditions, the 
Jacobian is often poorly conditioned due to these discretization errors. 

We have found that posing boundary value problems governed by PDEs as overdetermined, nonlinear least squares 
problems can dramatically improve the robustness of shooting methods in two critical ways. First, we improve accu- 
racy by padding the initial condition with high-frequency modes that are constrained to be zero. With enough padding, 
all the degrees of freedom controlled by the shooting method can be resolved sufficiently to compute a reliable Jaco- 
bian. Second, adding more rows to the Jacobian increases its smallest singular values, often improving the condition 
number by several orders of magnitude. The extra rows come from including the high-frequency modes of the bound- 
ary conditions in the system of equations, even though they are not included in the list of augmented initial conditions. 
As a rule of thumb, it is usually sufficient to set the top 1/3 to 1/2 of the Fourier spectrum to zero initially; additional 
zero-padding has little effect on the numerical solution or the condition number Validation of accuracy by monitoring 
energy conservation and decay rates of Fourier modes will be discussed in Section 4.4 along with mesh refinement 
studies and comparison with quadruple precision calculations. 

In this paper, we present two methods of solving the nonlinear least squares problem that arises in the overde- 
termined shooting framework. The first is the adjoint continuation method (ACM) of Ambrose and Wilkening 
ll35l l36l l37l |38 1, in which the gradient of the objective function with respect to initial conditions is computed by 
solving an adjoint PDF, and the BFGS algorithm ll39l l40ll is used for the minimization. This was the approach used 
by one of the authors in her dissertation |41 1 to obtain the results of Sections 4.1 and 4.6 In the second approach, we 
exploit an opportunity for parallelism that makes computing the entire Jacobian feasible. Once this is done, a variant 
of the Levenberg-Marquardt method (with less frequent Jacobian updates) is used to rapidly converge to the solution. 
The main challenge here is organizing the computation to maximize re-use of setup costs in solving the variational 
equation with multiple right-hand sides, to minimize communication between threads or with the GPU device, and to 
ensure that most of the linear algebra occurs at level 3 BLAS speed. The performance of the algorithms on various 



platforms is reported in Section 4.7 



The scientific focus of the present work is on resonance and its effect on existence, non-uniqueness, and physical 
behavior of standing water waves. A summary of our main results is given in the abstract, and in more detail at the 
beginning of Section|4] We mention here that resonant modes generally take the form of higher-frequency, secondary 
standing waves oscillating at the surface of larger-scale, primary standing waves. Because the equations are nonlinear, 
only certain combinations of amplitude and phase can occur for each component wave. This leads to non-uniqueness 
through multiple branches of solutions. In shallow water, bifurcation curves of high-frequency Fourier modes behave 
erratically and contain many gaps where solutions do not appear to exist. This is expected on theoretical grounds. 
However, these bifurcation "curves" become smoother, or "heal," as fluid depth increases. In infinite depth, such 
resonant effects are largely invisible, which we quantify and discuss in Section|5] 

In future work [42|, the methods of this paper will be used to study other families of time-periodic solutions of the 
free-surface Euler equations with less symmetry than is assumed here, e.g. traveling-standing waves, unidirectional 
solitary wave interactions, and collisions of gravity-capillary solitary waves. The stability of these solutions will also 
be analyzed in using Floquet methods. 



2. Equations of motion and time-stepping 

The eff^ectiveness of a shooting algorithm for solving two-point boundary value problems is limited by the accuracy 
of the time-stepper In this section, we describe a boundary integral formulation of the water wave problem that is 
spectrally accurate in space and arbitrary order in time. We also describe how to implement the method in double 
and quadruple precision using a GPU, and discuss symmetries of the problem that can be exploited to reduce the 
work of computing standing waves by a factor of 4. The method is similar to other boundary integral formulations 
Il43l|44l|45]|21[3l|4]|46;|, but is simpler to implement than the angle-ai'dength formulation used in ll47l l48l l49l l37l . 
and avoids issues of identifying two curves that are equal "up to reparametrization" when the x and y coordinates 
of the interface are both evolved (in non-symmetric problems). Our approach also avoids sawtooth instabilities that 
sometimes occur when using Lagrangian markers f??, 2|. This is consistent with the results of Baker and Nachbin 
|50|, who found that sawtooth instabilities can be controlled without filtering using the correct combination of spectral 
differentiation and interpolation schemes. While conformal mapping methods ifSTl l52l |53]| are more efficient than 
boundary integral methods in many situations, they are not suitable for modeling extreme waves as the spacing between 
grid points expands severely in regions where wave crests form, which is the opposite of what is needed for an efficient 
representation of the solution via mesh refinement. 
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2.1. Equations of motion 

We consider a two-dimensional irrotational ideal fluid ll54l l55l l56l l57l bounded below by a flat wall and above 
by an evolving surface, rjix, t). Because the flow is irrotational, there is a velocity potential such that u = V0. The 
restriction of (p to the free surface is denoted (f{x, t) - (p(x, r](x, t), t). The equations of motion governing rjix, t) and 
ip(x, t) are 



TJ, - Cpy - Tj^Cpx 



1 2 1 2 O" 

2 2 - p 



(2.1a) 
(2.1b) 



Here g is the acceleration of gravity, p is the fluid density, cr > is the surface tension (possibly zero), and P is the 
projection to zero mean that annihilates constant functions. 



-2;r 



P = id-Po, 



1 r 

Piif=:r- fix)dx. 
271 Jo 



(2.2) 



This projection is not standard in (2.1b i, but yields a convenient convention for selecting the arbitrary additive constant 



in the potential. In fact, if the fluid has infinite depth and the mean surface height is zero, P has no efi'ect in (2.1b i at 
the PDE level, ignoring roundoff and discretization errors. The velocity components u - (p^, v - 0,, on the right hand 
side of (2.1 1 are evaluated at the free surface to determine rjt and (fi. The system is closed by relating in the fluid to 



77 and (f on the surface as the solution of Laplace's equation 

(pxx + 4>yy - 0, 
4>y = 0, 
= 



-h < y < Tj, 

y — —h. 



(2.3a) 
(2.3b) 
(2.3c) 



where h is the mean fluid depth (possibly infinite). We assume //(x, f) and u(x,y, f) are 27r-periodic in x. Applying a 
horizontal Galilean transformation if necessary, we may also assume is 27r-periodic in x. We generally assume h - Q 
in the finite depth case and absorb the mean fluid depth into rj itself. This causes -rjix) to be a reflection of the free 
surface across the bottom boundary, which simplifies many formulas in the boundary integral formulation below. The 
same strategy can also be applied in the presence of a more general bottom topography 



Equation ( 2. 1 a 1 is a kinematic condition requiring that particles on the surface remain there. Equation ( 2. lb 1 comes 



from to, - (pyTit + d>, and the unsteady Bernoulli equation, (p, + i|Vc6p + gy + - - c{t), where c(t) is constant in space 

z p 

but otherwise arbitrary. At the free surface, we assume the pressure jump across the interface due to surface tension is 
proportional to curvature, po - p\y=,j = ctk. The ambient pressure po is absorbed into the arbitrary function c(f), which 
is chosen to preserve the mean of ipix, t): 



cit)^f^+P, 
P 



1 2 1 2 o' 

rix4>x^y + ^ 0x - o 0y + ^'7 Ox 

11- p 



xJ 



(2.4) 



The advantage of this construction is that u = V(/> is time-periodic with period T if and only if 77 and <f are time- 
periodic with the same period. Otherwise, ip(x, T) could differ from (f(x, 0) by a constant function without affecting 
the periodicity of u. 

Details of our boundary integral formulation are given in AppendixjA] Briefly, we identify with C and parametrize 
the free surface by 

((a) = ^a) + ii]{^(a)), (2.5) 

where the change of variables x = ^(a) allows for smooth mesh refinement in regions of high curvature, and t has been 
suppressed in the notation. We compute the Dirichlet-Neumann operator li58J . 



I dcp 

Qipix) = y 1 + ri'(xY + iTj(x)) 



(2.6) 



which appears implicitly in the right hand side of (2.1 1 through cp^ and 0,., in three steps. First, we solve the integral 
equation 



-2n 



ll^ia) + ^ ^ [^1 («'^) + K2{a,/3)]ij(/3) d/3 = ^(^(a)) 



(2.7) 
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for the dipole density, /i(Q'), in terms of the (known) Dirichlet data (pi^ia)). Formulas for K\ and K2 are given in ( |2.13| l 
below. These kernels are smooth functions (even at a = j3), so the integral is not singular; see Appendix [A] Second, 
we differentiate ^(a) to obtain the vortex sheet strength, y{a) - fi'ia). Finally, we evaluate the normal derivative of 
at the free surface via 



1 r^" 

si ' 



[Gi(a,/3) + G2(a,/3)]y(/3)d/3 



(2.8) 



Gi and G2 are defined in (2.13 1 below, and H is the Hilbert transform, which is diagonal in Fourier space with symbol 
Hk = -isgn(k). The only unbounded operation in this procedure is the second step, in which y{a) is obtained from 
fj.(a) by taking a derivative. 

Once Qipix) is known, we compute (fix and 0,, on the boundary using 



1 / 1 -ri\x)\U'{x) 



(2.9) 



which allows us to evaluate (ZTajl and (2.1b 1 for rf, and (p,. Alternatively, one can write the right hand side of (2.1 
directly in terms of ip'ix) and Qip{x). 



2.2. GPU-accelerated time-stepping and quadruple precision 

Next we turn to the question of discretization. Because we are interested in studying large amplitude standing 
waves that develop relatively sharp wave crests for brief periods of time, we discretize space and time adaptively. 
Time is divided into v segments OiT, where 61 + ■ ■ ■ + 0y - 1 and T is the simulation time, usually an estimate of the 
period or quarter-period. In the simulations reported here, v ranges from 1 to 5 and each 0i was close to 1 /v (within a 
factor of two). On segment /, we fix the number of (uniform) timesteps, A^;, the number of spatial grid points, M/, and 
the function 



X" ( 1 - P[Ai sin'^(Q;/2)], to refine near x - n 



1 - P[Ai cos'*(af/2)], to refine near x — Q 



8(1 -p;) 
5 + 3p, ' 



(2.10) 



which controls the grid spacing in the change of variables x - ^i(a); see Figure[T] As before, P projects out the mean. 
The parameter pi lies in the range < p/ < 1 and satisfies 



PI 



mm{E,(0),E,(n)} 
max{£,(0), E,(7t)} ' 



rmn{E,(0),Ei(7T)} 



8pi 



5 + 3p; 



max{£,(0), Eiin)} 



5 + 3p/ 



(2.11) 



Note that pi - \ corresponds to uniform spacing while pi - corresponds to the singular limit where ^/ ceases to be a 
dififeomorphism at one point. This approach takes advantage of the fact that we can arrange in advance that the wave 
crests will form at x = and x = tt, alternating between the two in time. A more automated approach would be to 
have the grid spacing evolve with the wave profile, perhaps as a function of curvature, rather than asking the user to 
specify the change of variables. We did not experiment with this idea since our approach also allows the number of 
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Figure 1: Dependence of mesh spacing on the paiameter p (dropping the subscript /) in l |2.10) , with mesh refinement near x = n. (left) Plots of 
X = ^(a) for p = 0.0, 0.02, 0.04, 0.08, 0.25, 0.6 and 1.0. (center) E(a) = d^/da represents the grid spacing relative to uniform spacing. Comparison 
of E{a) and E(^^^{x)) shows how the grid points are re-distributed, (right) A magnified view near a = n shows that when p reaches 0, ^(a) ceases 
to be a difteomoiphism and £(f"'(jr)) forms a cusp. 
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grid points to increase in time, which would be complicated in an automated approach. We always set pi = 1 so that 
X = Q' on the first segment. Respacing the grid from segment / to / + 1 boils down to interpolating rj and tp to obtain 
values on the new mesh, e.g. t] o ^i^i(aj) - tj o ^/(^^"' o ^/+i(a^)), Oj - 2nj/Mi, which is straightforward by Newton's 
method. To be safe, we avoid refining the mesh in one region at the expense of another; thus, if < p/, we also 
require (Mi+i/Mi) > (5 + 3p;)/(5 + 3p/+i) so that the grid spacing decreases throughout the interval, but more so in the 
region where the wave crest is forming. 

Since the evolution equations are not stiff unless the surface tension is large, high order explicit time-stepping 



schemes work well. For each Runge-Kutta stage within a timestep on a given segment I, the integral equation (2.7 1 is 



solved by collocation using uniformly spaced grid points aj = 2jt j/Mi and the (spectrally accurate) trapezoidal rule, 





- K(ai,f5)fi(p)d/3^ —yK(ai,ajUaj). (2.12) 
'.n Jo ^' M) 



The matrices K/j - K{ai, aj)/Mi and G,j = G(a,-, aj)/Mi that represent the discretized integral operators in (2.7 1 and 



(2.8 1 are computed simultaneously and in parallel. The formulas are K{a,fS) = Ki{a,/3) + K2{a,P) and G(a,/3) - 



Gi{a,l3) + G2{a,l3) with 



Ki = Im — — cot cot — - — > , = Im -j — — cot 



2 2 2 2 J 2 2 

Gi = Re < cot cot > , G2 = Re < — ; — cot 



2 2 2 2 2 2 



(2.13) 



As explained in Appendix [A] these kernels have been regularized. Indeed, Ki(a,/3) and Gi(a,/3) are continuous 
at /? - a if we define Ki - - lm{(" {a) / [2(' (a)]} and Gi = Re{("(a)/[2^'{a)]}. These formulas are used when 



computing the diagonal entries Ku and G„. The terms cot((Qf, - aj)/2) in (2.13 1 are computed once and for all at the 



start. If the fluid depth is infinite, K2 and G2 are omitted. GMRES is used to solve (2.7 1 for//, which consistently takes 



4-30 iterations to reach machine precision (independent of problem size). In quadruple precision, the typical range is 



9-36 GMRES iterations. The FFT is used to compute yu' and Hy in (2.8 1, as well as 77', and ip' . 

We wrote 3 versions of the code, which differ only in how the matrices K and G are computed. The simplest 
version uses openMP parallel for loops to distribute the work among all available threads. The most complicated 
version is parallelized using MPI and scalapack. In this case, the matrices K and G are stored in block-cyclic layout 
||59l across the processors, and each processor computes only the matrix entries it is responsible for. The fastest 
version of the code is parallelized on a GPU in the cuda programming language. First, the CPU sends the GPU the 
vector ^(aj), which holds M/ complex numbers. Next, the GPU computes the matrices K and G and stores them in 
device memory. Finally, in the GMRES iteration, Krylov vectors are sent to the GPU, which applies the matrix K and 
returns the result as a vector. After the last Krylov iteration, the device also applies G to /i to help compute 0(p in (|2.8|l. 



Thus, communication with the GPU involves passing vectors of length M/, while 0{M^) flops must be performed on 
each vector passed in. As a result, communication does not pose a computational bottleneck, and the device operates 
at near 100% efficiency. We remark that the formula 

X + iy I [cos(;t:) -H cosh(y)]/[sin(;c) H- / sinh(y)], cos(jic) > 0, 
cot = < 

2 1 [sin(x) - / sinh(3;)]/[cosh(3;) - cos(ji:)], cos(x) < 

is relatively expensive to evaluate. Thus, it pays to compute K and G simultaneously (to re-use sin, cos, sinh, cosh 
results), and to actually store the matrices in device memory rather than re-compute the matrix entries each time a 
matrix- vector product is required. 

In double-precision, we evolve ( |2.1[ ) using Dormand and Prince's DOP853 scheme ll60l . This is a 13 stage, 8th 
order, "first same as last" Runge-Kutta method, so the effective cost of each step is 12 function evaluations. We 
apply the 36th order filter described in [61 1 to the right hand side of (le) and (If) each time they are evaluated in the 
Runge-Kutta procedure, and to the solution itself at the end of each time-step. This filter consists of multiplying the 
kth Fourier mode by 

exp [-36(|^|/^™ax)''] , A:^ax = M/2, (2. 14) 

which allows the highest-frequency Fourier modes to remain non-zero (to help resolve the solution) while still sup- 
pressing aliasing errors. To achieve truncation errors of order 10"^° in quadruple-precision, the 8th order method 



6 



requires too many timesteps. Through trial and error, we found that a 15th order spectral deferred correction (SDC) 
method |62, 63 64 1 is the most efficient scheme for achieving this level of accuracy. Our GPU implementation of 



quadruple precision arithmetic will be discussed briefly in Section 3.3 The variant of SDC that we use in this paper 
employs eight Radau Ila quadrature nodes |60|. The initial values at the nodes are obtained via fourth order Runge- 
Kutta. Ten correction sweeps are then performed to improve the solution to accuracy at the quadrature nodes. 

We use pure Picard corrections instead of the more standard forward-Euler corrections as they have slightly better 
stability properties. The final integration step yields a local truncation error of 0{h^^)\ hence, the method is 15th order. 
See [65 1 for more information about this variant of the SDC method and its properties. If one wished to go beyond 
quadruple-precision arithmetic, it is straightforward to increase the order of the time-stepping scheme accordingly. We 
did not investigate the use of symplectic integrators since our approach already conserves energy to 12-16 digits of 
accuracy in double precision, and 24-32 digits in quadruple precision. 

2.3. Translational and time-reversal symmetry 

In this paper, we restrict attention to symmetric standing waves of the type studied in ETl [131 l22l l23l l2l [3l 171 l28l . 
For these waves, it is only necessary to evolve the solution over a quarter period. Indeed, if at some time T/4 the 
fluid comes to rest {(p = 0), a time-reversal argument shows that the solution will evolve back to the initial state at 
T/2 with the sign of ip reversed. More precisely, the condition (p(x,T/4) - implies that ri(x,T/2) - r](x,Q) and 
(p(x, T/2) - -(f{x,Q). Now suppose that, upon translation by n, ri{x,0) remains invariant while (pix,Q) changes sign. 
Then we see that 771 (x, t) - rjix + n,T 12 + t) and ip\{x, t) - (f(x + n,T 12 + t) are solutions of with initial conditions 

77i(x,0) = ri(x + n, T 12) = ri{x + n,Q) = ?7(jc,0), 
if^ix, 0) = ipix + n, T/2) = -ifi(x H- TT, 0) = ifi(x, 0). 

Therefore, 771 - ri,ip\ - if, and 

ri{x, T) — Tjiix- n, T/2) — rj(x — n, T/2) — rj(x — 7t,0) — t]{x,0), 
<p(x, T) — ifii(x — 7T, T/2) — (p(x — 7T, T/2) — —<p{x — n,0) — <p(x, 0). 

Hence, 77 and ip are time-periodic with period T. It is natural to expect standing waves to have even symmetry when 
the origin is placed at a crest or trough and the fluid comes to rest. This assumption implies that 77 and (p will remain 



even functions for all time since 77, and ^, in (2.1 1 are even whenever 77 and ip are. Under all these assumptions, the 



evolution of 77 and (p from r/2 to T is a mirror image (about x - |orjc = of the evolution from to T/2. 

Once the initial conditions and period are found using symmetry to accelerate the search for time-periodic solu- 
tions, we double-check that the numerical solution evolved from to T is indeed time-periodic. Mercer and Roberts 
exploited similar symmetries in their numerical computations \2^^. 



3. Overdetermined shooting methods 

As discussed in the introduction, two-point boundary value problems governed by partial dififerential equations 
must be discretized before solving them numerically. However, truncation errors lead to loss of accuracy in the highest- 
frequency modes of the numerical solution, which can cause difficulty for the convergence of shooting methods. We 
will see below that robustness can be achieved by posing these problems as overdetermined nonlinear systems. 

In Section [3T| we define two objective functions with the property that driving them to zero is equivalent to finding 
a time-periodic standing wave. One of the objective functions exploits the symmetry discussed above to reduce the 
simulation time by a factor of 4. The other is more robust as it naturally penalizes high-frequency Fourier modes of 
the initial conditions. Both objective functions use symmetry to reduce the number of unknowns and eliminate phase 
shifts of the standing waves in space and time. The problem is overdetermined because the highest-frequency Fourier 
modes are constrained to be zero initially but not at the final time. Also, because T/4 often corresponds to a sharply 
crested wave profile, there are more active Fourier modes in the solution at that time than at f = 0. By refining the 
mesh adaptively, we include all of these active modes in the objective functions, making them more overdetermined. 
The idea that the underlying dynamics of standing water waves is lower-dimensional than predicted by counting active 
Fourier modes has recently been explored by Williams, et al. |66|. 



In Sections 3.2 and 3.3 we describe two methods for solving the resulting nonlinear least squares problem. The 



first is the Adjoint Continuation Method ll35l[36l[371[38ll . in which the gradient of the objective function is computed 
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by solving an adjoint PDE and the BFGS algorithm f39','40l is used for the minimization. The second is a trust-region 
approach in which the Jacobian is computed by solving the variational equation in parallel with multiple right-hand 
sides. This allows the work of computing the Dirichlet-Neumann operator to be shared across all the columns of the 
Jacobian. We also discuss implementation issues in quadruple precision on a GPU. 

3.1. Nonlinear least squares formulation 

In the symmetric standing wave case considered here, we assume the initial conditions are even functions satisfying 
ri{x + n,0) - rj{x, 0) and if(x + tt, 0) - -(p(x, 0). In Fourier space, they take the form 

UO) = C|,|, (k = ±2, ±4, ±6, . . . ; 1^1 < n), 
^,(0) = C|,|, (^ = ±1,±3,±5,... ; \k\<n), 

where ci, . . . ,c„ are real numbers, and all other Fourier modes of the initial conditions are set to zero. (In the finite 
depth case, we also set 770 - h, the mean fluid depth.) Here n is taken to be somewhat smaller than Mi, e.g. n » \Mi, 
where Mi is the number of spatial grid points used during the first A^i timesteps. (Recall that subscripts on M and 
refer to mesh refinement sub-intervals.) Note that high-frequency Fourier modes of the initial condition are zero- 
padded to improve resolution of the first n Fourier modes. 

In addition to the Fourier modes of the initial condition, the period of the solution is unknown. We add a zeroth 
component to c to represent the period: 

r = co. (3.16) 
Our goal is to find c e M"^' such that (p{x, T/4) = 0. We therefore define the objective function 

/(c) = ^r(c)^r(c) ^ f <fix,T/4fdx, n = v(^v(«,), T/4) y/EAad/M,, (3.17) 
2 47T Jo 

where v is the index of the final sub-interval in the mesh refinement strategy and the square root is a quadrature weight 
to approximate the integral via the trapezoidal rule after the change of variables x - ^y{a), dx - Ey(a) da. Note 
that r e M™ with m = My, which is usually several times larger than n, the number of non-zero initial conditions. 
The numerical solution is not sensitive to the choice of m and n as long as enough zero-padding is included in the 



initial condition to resolve the highest frequency Fourier modes. This will be confirmed in Section 4.4 through mesh- 
refinement studies and comparison with quadruple-precision computations. 

One can also use an objective function that measures deviation from time-periodicity directly: 



4n. 

When the underlying PDE is stiff" (e.g. for the Benjamin-Ono ||35l[36l or KdV equations), an objective function of the 



f(c)^^ r [T](x,T)-T^(x,0)f + [^(x,T)-^(x,0)fdx. (3.18) 
47r Jo 



form ( 3 . 1 8 1 has a key advantage over ( 3 . 1 7 1. For stiff problems, semi-implicit time-stepping methods are used in order 



to take reasonably large time-steps. Such methods damp high-frequency modes of the initial condition. This causes 



these modes to have Uttle eff'ect on an objective function of the form (3.17 1; thus, the Jacobian Jij - dri/dcj can be 



poorly conditioned if the shooting method attempts to solve for too many modes. By contrast, when implemented via 



(3.18 1, the initial conditions of high-frequency modes are heavily penalized for deviating from the damped values at 
time T. As a result, the Jacobian does not suffer from rank deficiency, and high-frequency modes do not drift far from 
zero unless doing so is helpful. Since the water wave is not stiff", we use explicit schemes that do not significantly 
damp high-frequency modes; therefore, the computational advantage of evolving over a quarter-period outweigh any 



robustness advantage of using (3.18 1 



We used symmetry to reduce the number of unknown initial conditions in p.l5| l. This has the added benefit of 
selecting the spatial and temporal phase of each solution in a systematic manner In problems where the symmetries of 
the solution are not known in advance, or to search for symmetry-breaking bifurcations, one can revert to the approach 
described in (35], where both real and imaginary parts of the leading Fourier modes of the initial condition were 
computed in the search for time-periodic solutions. To eliminate spatial and temporal phase shifts, one of the Fourier 
modes was constrained to be real and its time derivative was required to be imaginary. Constraining the time-derivative 
of a mode is most easily done with a penalty function f35'\. Alternatively, if two modes are constrained to be real and 
their time-derivatives are left arbitrary, it is easier to remove their imaginary parts from the search space than to use a 
penalty function. 
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Once phase shifts have been eliminated, the famihes of time -periodic solutions we have found appear to sweep out 
two-parameter families of solutions. To compute a solution in a family, we specify the mean depth and the value of 



one of the q in (3.15 1 or (3.16 1 and solve for the other cj to minimize the objective function. If / is reduced below 
a specified threshold (typically 10"^^ in double-precision or 10"^^ in quadruple precision), we consider the solution 
to be time-periodic. If / reaches a local minimum that is higher than the specified threshold, we either (1) refine the 
mesh, increase n, and try again; (2) choose a different value of q closer to the previous successful value; or (3) change 
the index k specifying which Fourier mode is used as a bifurcation parameter Switching to a different k is often useful 
when tracking a fold in the bifurcation curve. Since c e M"^' and one parameter has been frozen, / is effectively a 
function of n variables. 

We note that once n and the mesh parameters v, 0i, A/, Mi and Ni are chosen, /(c) is a smooth function that can 
be minimized using a variety of optimization techniques. Small divisors come into play when deciding whether / 
would really converge to zero in the mesh refinement limit (with ever increasing numerical precision). The answer 
may depend on whether the bifurcation parameters (770 and either q(a, 0) or one of the c/.) are allowed to vary within the 
tolerance of the current roundoff threshold each time the mesh is refined and the floating point precision is increased. 
While it is likely that small divisors prevent the existence of smooth families of exact solutions, exceedingly accurate 
approximate solutions do appear to sweep out smooth families, with occasional disconnections in the bifurcation 
curves due to resonance. 



3.2. Adjoint continuation method 

Having recast the shooting method as an overdetermined nonlinear least squares problem, we must now minimize 



the functional / in (3.17 1 or (3.18 1. The first approach we tried was the adjoint continuation method (ACM) developed 
by Ambrose and Wilkening to study time-periodic solutions of the Benjamin-Ono equation |f 35] [361 and the vortex 
sheet with surface tension ||37l . The method has also been used by Williams et al. to study the stability transition from 
single-pulse to multi-pulse dynamics in a mode-locked laser system ll38l . 

The idea of the ACM is to compute the gradient of / with respect to the initial conditions by solving an adjoint 
PDE, and then minimize / using the BFGS method |39 40 1. BFGS is a quasi-Newton algorithm that builds an 
approximate (inverse) Hessian matrix from the sequence of gradient vectors it encounters on successive line-searches. 



In more detail, let q - (77, ip) and denote the system (2.1 1 abstractly by 



We define the inner product 



q,^F{q), q{x,Q) = qo(x). 



1 r^" 

,q2)^ — j [r]i(x)r]2(x) + (fi {x)(p2 (x)] dx 



(3.19) 



(3.20) 



so that / in ( 3.17| i, written now as a function of the initial conditions and proposed period, which themselves depend 
on c via ( |3.15 1 and ( |3.16| l, takes the form 



/(?o,r)= ^ll(0,^(-,r/4))||2 = -1 r <f(x,T/4fdx, 
2 471 Jo 



(3.21) 



where q - (ri,ip) solves (3.19i. The case with / of the form (3.18 1 is similar, so we omit details here. In the course 
of minimizing /, the BFGS algorithm will repeatedly query the user to evaluate both /(c) and its gradient Vc/(c) at a 
sequence of points c e M"^'. The T derivative, df/dco, is easily obtained by evaluating 



dT 



-In 



1 r 

= — ^(x,r/4)^,(x,r/4)flfx 

87!- Jo 



(3.22) 



using the trapezoidal rule after changing variables, x - ^v(Q'), dx - Ey{a)da. Note that (fi{-, T/4) and (f,{-, T/4) are 
already known by solving ( |2.1| i. One way to compute the other components of V^/, say df/dc^, would be to solve the 
variational equation, (written abstractly here and explicitly in Appendix [B| 



q, = DF(q(-, t))q, q(x, 0) = qoix) 



with initial conditions 



qo(x) = 



((gik^ + e-'''\Q), ;t= 1,3,5, 



(0, 



+ €-•'"'), k^ 2,4,6, 



(3.23) 



(3.24) 
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to obtain 



dck de 



f(qo + sqo, T) = < (0, ^i; T/4)) , (0, <P(; T/4)) >. 



(3.25) 



Note that a dot denotes a directional derivative with respect to the initial condition, not a time-derivative. To avoid 



the expense of solving ( 3.23 1 repeatedly for each value of k, we solve a single adjoint PDE to find 6f/6qo such that 
f - {Sfl Sqo , qo)- From ( 3.25 i, we have 



/ = < (0, ^i; T/4)) , (fji; T/4), Cp{; T/4)) ) = < §0 , q{; T/4) >, 



(3.26) 



where we have defined qo - (ijo, (fio) with = and = ^P(s T/4). Note that replacing by r](-, T/4) did not affect 
the inner product. Next we observe that the solution q(x, s) of the adjoint equation 



q, = DFiqi;T/4-s)rq, qi;Q) ^ qo, 
which evolves backward in time {s - T/4 - t), has the property that 

{q(:, T/4 - f), q{-, f)) = const. 
Setting t - T /A shows that this constant is actually /. Setting f = gives the form we want: 



/ = < 5f/Sqo , qo). 



5f 



q(;T/4). 



(3.27) 



(3.28) 



(3.29) 



From ( |3.24| l, we obtain 



dl 
dck 



2Rt{r,l{T/4)], 
2Re{^^(r/4)), 



/t= 1,3,5,. 
A; = 2,4,6,. 



(3.30) 

Together with (|3.22|i, this gives all the components of V^/ at once. Explicit formulas for the linearized and adjoint 



equations (3.23 i and (3.27 1 are derived in Appendix |B] 



Like (3.23 1, the adjoint equation (3.27i is linear, but non-autonomous, due to the presence of the solution q(t) of 
p.l9| l in the equation. In the BFGS method, the gradient is always called immediately after computing the function 
value; thus, if q(t) and q,{f) are stored in memory at each timestep in the forward solve, they are available in the adjoint 
solve at intermediate Runge-Kutta steps through cubic Hermite interpolation. We actually use dense output formulas 
ll67l l60l for the 5th and 8th order Dormand-Prince schemes since cubic Hermite interpolation limits the accuracy 
of the adjoint solve to 4th order, but the idea is the same. If there is insufficient memory to store the solution at 
every timestep, we store the solution at equally spaced mile-markers and re-compute q between them when q reaches 
that region. Thus, V/ can be computed in approximately the same amount of time as / itself, or twice the time if 
mile-markers are used. 

It is worth mentioning that, when discretized, the values of tj and are stored on a non-uniformly spaced grid for 
each segment I e {2, . . . , v) in the mesh-refinement strategy. The adjoint variables rj, if are stored at the same mesh 
points, and are initialized by 

^0 ° &(ai) =0, ^0 ° &(ai) = Vi^iad, T/4), 



with no additional weight factors needed. This works because the inner product (3.20i is defined with respec t to x 
rather than a, and the change of variables has been accounted for by the factor -\/Ey(ai)/My in the formula ( 3. 17 1 for /. 



5.5. Trust-region shooting method 

While the ACM method gives an efficient way of computing the gradient of /, it takes many line-search iterations 
to build up an accurate approximation of the Hessian of /. This misses a key opportunity for parallelism and re-use of 
data that can be exploited if we switch from the BFGS framework to a Levenberg-Marquardt approach f40l. Instead 



of solving the adjoint equation (3.27 i to compute V/ efficiently, we solve the variational equation (3.23 1 with multiple 



right-hand sides to compute all the columns of the Jacobian simultaneously. From (3.17 i, we see that 



Jik - 



dn 

dck 



I <p,(Uad, T/4) ^JEy(a^/My, k ^ 0, 
<f(Uai), T/4) y/Ey(ai)/My, k>l. 



(3.31) 
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where qo is initialized as in p.24| l for k > \. We avoid the need to store q at every timestep (or at mile-markers) by 
evolving q along with q rather than interpolating q: 

dt\q) \DF(q)qj' q(0) ^ qo ^ dqo/dc,. 



In practice, we replace q in (3.32 1 by the matrix Q - [q(k=\), . . . , q(k=n)] to compute all the columns of J (besides k - 0) 



at on ce. The linearized equations (|B.l|i involve the same Dirichlet-to-Neumann operator as the nonlinear equations 



(2.1 1, so the matrices K and G in (2.7 1 and (2.8 1 only have to be computed once to evolve the entire matrix Q through 
a Runge-Kutta stage. Moreover, the linear algebra involved can be implemented at level 3 BLAS speed. For large 
problems, we perform an LU-factorization of K, the cost of which is made up for many times over by replacing 
GMRES iterations with a single back-solve stage for each right-hand side. In the GPU version of the code, all the 
linear algebra involving K and G is performed on the device (using the CULA library). As before, communication 
with the device is minimal in comparison to the computational work performed there. 

We emphasize that the main advantage of solving linearized equations is that the same DNO operator is used for 
each column of 2 in a given Runge-Kutta stage. This opportunity is lost in the simpler approach of approximating J 



through finite differences by evolving ( 3. 19 1 repeatedly, with initial conditions perturbed in each coordinate direction: 
^^^^ n(c + sek)-n(c) ^ = (0,..., 0,1,0,..., 0)^eM"+'. (3.33) 

E 

Thus, while finite differences can also be parallelized efliciently by evolving these solutions independently, the matrices 
K and G will be computed n times more often in the finite difference approach, and most of the linear algebra will 
drop from running at level 3 BLAS speed to level 2. Details of our Levenberg-Marquardt implementation are given in 
Appendix |C] where we discuss how to re-use the Jacobian several times rather than re-computing it each time a step 
is accepted. 

The CULA and LAPACK libraries could not be used for quadruple precision calculations, and we did not try 
FFTW in that mode. Instead, we used custom FFT and linear algebra libraries (written by WiUcening) for this purpose. 
However, for the GPU, we did not have any previous code to build on. Our solution was to write a block version of 
matrix-matrix multiplication in CUDA to compute residuals in quadruple precision, then use iterative refinement to 
solve for the corrections in double-precision, using the CULA library. Although quadruple precision is not native on 
any current GPU, we found M. Lu's gqd package 16811 . which is a CUDA implementation of Bailey's qd package ||69l , 
to be quite fast. Our code is written so that the floating point type can be changed through a simple typedef in a header 
file. This is possible in C-H-i- by overloading function names and operators to call the appropriate versions of routines 
based on the argument types. 

4. Numerical results 



This section is organized as follows: In Section 4.1 we use the Adjoint Continuation Method to study standing 
waves of wavelength In in water of uniform depth h - \. Several disconnections in the bifurcation curves are 
encountered, which are shown to correspond physically to higher-frequency standing waves superposed (nonlinearly) 
on the low-frequency carrier wave. In Section |4.2| we use the trust-region approach to study a nucleation event 
in which isolated large-amplitude solutions, and closed loops of such solutions, suddenly exist for depths below a 
threshold value. This gives a new mechanism for the creation of additional branches of solutions (besides harmonic 
resonance |I3] ID). In Section 4.3 we study a "Wilton ripple" phenomenon f70] [TTl IS) |72l |73l in which a pair of 



"mixed mode" solutions bifurcate along side the "pure mode" solutions at a critical depth. Our numerical solutions are 
accurate enough to identify the leading terms in the asymptotic expansion of these mixed mode solutions. Following 
the mixed-mode branches via numerical continuation reveals that they meet up with the pure mode branches again at 



large amplitude. We also study how this degenerate bifurcation splits when the fluid depth is perturbed. In Section 4.4 
we study what goes wrong in the Penney and Price conjecture, which predicts that the limiting standing wave of 
extreme form will develop sharp 90 degree corner angles at the wave crests. We also discuss energy conservation, 



decay of Fourier modes, and validation of accuracy. In Section 4.5 we study collisions of counter-propagating solitary 
water waves that are elastic in the sense that the background radiation is identical before and after the collision. In 
Section 4.6 we study time-periodic gravity-capillary waves of the type studied by Concus f23l and Vanden-Broeck 



IfTOl using perturbation theory. Finally, in Section 4.7 we compare the performance of the algorithms on a variety of 



parallel machines, using MINPACK as a benchmark for solving nonlinear least squares problems. 
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4.1. Standing waves of unit depth 

We begin by computing a family of symmetric standing waves with mean fluid depth fjo - h - \ and zero surface 
tension. The linearized equations about a flat rest state are 

f^,=gCp, ^,^P[-grj], [Q[e''"']^[kiMh\e''"'). (4.34) 

Thus, the linearized problem has standing wave solutions of the form 

rj - A sin ut cos kx, (p - Bcosojtcoskx, a/ - kgtanhkh, A/B- -^Jik/g) tanhfe/j. (4.35) 

Setting h = I, g - 1, k - 1, these solutions have period T - Itt/oj x 7.19976. Here B is a free parameter controlling 
the amplitude, and A is determined by A/B - Vtanh 1. 

To find time-periodic solutions of the nonlinear problem, we start with a small amplitude linearized solution 



as an initial guess. Holding ci = i^i(O) constant, we solve for the other q in (3.15 1 using the ACM method of 



Section 3.2 Note that ci = B in the linearized regime. We then repeat this procedure for another value of ci to obtain 
a second small-amplitude solution of the nonlinear problem. The particular choices we made were ci - -0.001 and 
ci - -0.002. We then varied ci in increments of -0.001, using linear extrapolation from the previous two solutions 
for the initial guess. The results are shown in Fig. [2] The two representative solutions labeled A and B show that the 
amplitude of the wave increases and the crest sharpens as the magnitude of ci increases. We chose ci to be negative 
so the peak at T/4 would occur at x - n rather than x = 0. An identical bifurcation curve (reflected about the T-axis) 
would be obtained by increasing ci from to positive values. The solutions A and B would then be shifted by n in 
space. 

For most values of ci between 0.0 and -0.23, the ACM method has no difficulty finding time-periodic solutions 
to an accuracy of / < 10"^^. However, at cj = -0.201, the minimum value of / exceeds this target. On further 
investigation, we found there was a small gap, ci e (-0.20113,-0.20124), where we were unable to compute time- 
periodic solutions even after increasing M from 256 to 512 and decreasing the continuation stepsize to Aci = 1.0x10"^. 
By plotting other Fourier modes of the initial conditions versus the period, we noticed that the 9th mode jumps 
discontinuously when ci crosses this gap. A similar disconnection appears to be developing near solution B. 

Studying the results of Fig. [2] we suspected we could find additional solutions by back-tracking from B to the 
region of the bifurcation curve around cg - -7.0 x 10"^ and performing a large extrapolation step to cg ^ -1.0 x 10"^, 
hoping to jump over the disconnection at B. This worked as expected, causing us to land on the branch that terminates 
at G in Fig|3] We used the same technique to jump from this branch to a solution between E and D. We were unable 
to find any new branches beyond C by extrapolation from earlier consecutive pairs of solutions. 

Next we track each solution branch as far as possible in each direction. This requires switching among the q 
as bifurcation parameters when traversing different regions of the solution space. The period, cq - T, is one of the 
options. We also experimented with pseudo-arclength continuation 1321 [311 l4l. but found that it is necessary to re- 
scale the Fourier modes to successfully traverse folds in the bifurcation diagram. This requires just as much human 
intervention as switching among the q, so we abandoned the approach. The disconnections at A and B turn out to 
meet each other, so that B is part of a closed loop and A is connected to the branch containing G. We stopped at G, F, 
C because the computations became too expensive to continue further with the desired accuracy of / < lO"^*" using 
the adjoint continuation method. 




Figure 2: A family of standing water waves of unit deptli (h = 1) bifurcates from tlie stationary solution at 7" = In/ Vtanh 1 a; 7.200. We 
used the ACM method to track the family out of the linearized regime via numerical continuation. The period initially decreases with amplitude, 
but later increases to surpass the period of the linearized standing waves. A resonance near solution A causes the 9th Fourier mode of <p to jump 
discontinuously as the period increases. This resonance has little effect on the first Fourier mode. 
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Figure 3: Several branches of standing waves were found by extrapolation across disconnections in the bifurcation curves. These disconnections 
are caused by resonant modes that may be iiiteipreted physically as high-frequency standing waves superposed (nonlinearly) on the low-frequency 
carrier wave. 




Figure 4: Bifurcation diagram showing c^ = ip7(0) versus T for standing waves of unit depth, along with snapshots of the evolution of tfi(x, t) 
for three of these solutions. A secondary standing wave with wave number k = 1 can be seen visibly superposed on (p(x, 0) in solution F, which 
corresponds to the large value of c-i at F in the diagram. 



The use of Fourier modes of the initial conditions in the bifurcation diagrams is unconventional, but yields insight 
about the effect of resonance on the dynamics of standing waves. We observe experimentally that disconnections 
in the bifurcation curves correspond to higher-frequency standing waves appearing at the surface of lower-frequency 
carrier waves. Because the equations are nonlinear, only certain combinations of amplitude and phase can occur. We 
generally see two possible solutions, one in which the high and low-frequency component waves are in phase with 
each other, and another where they are out of phase. For example, solutions F and G in Fig. |3]can both be described as 
a ^ = 7 wave-number standing wave oscillating on top of a A: = 1 carrier wave, but the smaller wave sharpens the crest 
at F and flattens it at G, being 180 degrees out of phase at F versus G when the composite wave comes to rest. (All the 
standing waves of this paper reach a rest state at f = r/4, by construction. Other types of solutions will be considered 
in future work B2l .) In Section 4.3 we show that this disconnection between branches F and G is caused by a (3,7) 



harmonic resonance at fluid depth h - \ .0397, where the period of the k - \ mode is equal to 3 times the period of the 
k -1 mode for small-amplitude waves l|4]|28l. 

In Fig.|4j we plot cy = ^7(0) versus T, along with the evolution of ipix, t) for several solutions over time. Note that 
the scale on the y-axis is 20 times larger here (with cj) than in Fig. [3] (with cg). This is why the secondary standing 
waves in the plots of solutions F and G appear to have wave number k -1 . We also note that the disconnections at A 
and B are nearly invisible in the plot of cy vs T. This is because the dominant wave number of these branches isk -9. 
Similarly, it is difficult to observe any of the side branches in the plot of ci vs T in Fig. [3] since they all sweep back 



and forth along nearly the same curve. We will return to this point in Section 4.3 



4.2. Nucleation of imperfect bifurcations 

We next consider the effect of fluid depth on these bifurcation curves. We found the ACM method was too slow 
to perform this study effectively, which partly motivated us to develop the trust region shooting algorithm. As shown 
in Fig. [5] if the fluid depth is increased from h - 1.0 to /z = 1.05, it becomes possible to track branches F and G to 
completion. The large amplitude oscillations in the 7th Fourier mode eventually die back down when these branches 
are followed past the folds at C7 +4 x lO"'' in Fig. [s] The branches eventually meet each other at an imperfect 
bifurcation close to the initial bifurcation from the zero-amplitude state to the k - \ standing wave solutions. This 



imperfect bifurcation was not present at /; = 1. Its nucleation will be investigated in greater detail in Section 4.3 The 



small bifurcation loops at A and B in Fig. |3]have disappeared by the time h - 1.05. If we continue to increase h to 
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Figure 5: If the mean depth, h, is increased from 1.0 to 1.05, the loop structure between A and B in Fig.[3]disappears, and branches F and G meet 
each other a second time at another imperfect bifurcation. As h increases further, these loops shrink, disappearing completely by the time h = 1.09. 



1.07, the top wing of the S-shaped bifurcation loop breaks free from the bottom wing and forms a closed loop. This 
loop disappears by the time h reaches 1.08. By h - 1.09, the k -1 resonance has all but disappeared. 

In Fig.|2] we saw that the period of standing waves of unit depth decreases to a local minimum before increasing 
with wave amplitude. Two of the plots of Fig. [5] show that this remains true for h - 1.05, but not for h - 1.07. In the 
latter case, the period begins increasing immediately rather than first decreasing to a minimum. This is consistent with 
the asymptotic analysis of Tadjbakhsh and Keller Il22l . which predicts that 



(jJo + ^£^<^2 + O(e^), Wp=tanh/i, (^2 = 3^(9wq^ - 12wq-' - Swq - 2wg), (4.36) 



where e controls the wave amplitude, and agrees with A in (4.35 i to linear order. The coiTection term (l)2 is positive 
for h < 1.0581 and negative for h > 1.0581. 



We will see in Section 4.3 that the nucleation of bifurcation branches between h - 1 .09 and h - 1 .0 is partly caused 
by a (3, 7) harmonic resonance (defined below) at fluid depth h - 1.0397. As this mechanism is complicated, we also 
looked for simpler examples in deeper water. The simplest case we found is shown in Fig. |6] For fluid depth h - 2, 
we noticed a pair of disconnections in the bifurcation curves that were not present for h -2.1. The 23rd Fourier mode 
of the initial condition exhibits the largest deviation from on the side branches of these disconnections. However, 
as discussed in the next section, this is not caused by a harmonic resonance of type (m, 23) for some integer m. To 
investigate the formation of these side branches, we swept through the region 6.64 < T < 6.68 with slightly different 
values of h, using co = T as the bifurcation parameter As shown in Fig. |6] when h = 2.0455, the bifurcation curve 
bulges slightly but does not break. As h is decreased to 2.045, a pair of disconnections appear and spread apart from 
each other. We selected h - 2.0453 as a good starting point to follow the side branches. As we hoped would happen, 
the two red side branches in the second panel of the figure met up with each other (at C23 ~ 7 x 10"^), as did the two 
black branches (at C23 ^ -7 x 10"^). We switched between T and C23 as bifurcation parameters to follow these curves. 
We then computed two paths (not shown) in which C23 = ±4x 10"^ was held fixed as h was increased. We selected 4 of 
these solutions to serve as starting points to track the remaining curves in Figure|6j which have fluid depths /12 through 
/15 given in the figure. We adjusted /12 to achieve a near three-way bifurcation. This bifurcation is quite difficult to 
compute as the Hessian of / becomes nearly singular; for this reason, some of the solutions had to be computed in 
quadruple precision to avoid falling off the curves. Finally, to find the points A and B where a single, isolated solution 
exists at a critical depth, we computed /i as a function of (T, C23) on a small 10 x 10 grid patch near A and B, and 
maximized the polynomial interpolant using Mathematica. 



4.3. Degenerate and secondary bifurcations due to harmonic resonance 

In this section, we explore the source of the resonance between the k - \ and k - 1 modes in water of depth h 
close to 1. While harmonic resonances such as this have long been known to cause imperfect bifurcations L3. ,4., .281 . 
we are unaware that anyone has been able to track the side-branches all the way back to the origin, where they meet 
up with mixed-mode solutions of the type studied asymptotically by Vanden-Broeck fTOl and numerically by Bryant 
and Stiassnie [6J. In the traveling wave case, such mixed-mode solutions are known as Wilton's ripples ITTI 1731 . 
When the fluid depth is perturbed, we find that the degenerate bifurcation splits into a primary bifurcation and two 
secondary bifurcations f74l. This is consistent with Bridges' work on perturbation of degenerate bifurcations in three- 
dimensional standing water waves in the weakly nonlinear regime 1721 . 
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'(323(0) vs T -^23(0) vs T r,{x,t) n,{x,t) 




hi = 2.0453 /!2 = 2.0453927 = 2.0454895 hi = 2.0455797 ft,, = 2.0456368 /!,4 = 2.0456520 h„ = 2.0456493 



Figure 6: A pair of imperfect bifurcations were found to coalesce as fluid depth increases, leaving behind two closed loops and a smooth bifurcation 
curve running between them. The loops each shrink to a point and disappear as fluid depth continues to increase. Reversing the process shows that 
isolated solutions can nucleate new branches of solutions as fluid depth decreases, (right) The nucleated solutions A and B are nearly identical on 
large scales, but contain secondary, high-frequency standing waves at smaller scales that are out of phase with each other. These small oscillations 
become visible when the slope of the wave profile is plotted. 



We begin by observing that the ratio of the periods of two small-amplitude standing waves is 



If we require m to be an integer and set ki 



Ti (L>2 
I — = : 

1 , we obtain 



k2 tanh 
ki tanh^i/j 



k2 tanh k2h - m tanh h. 



(4.37) 



(4.38) 



Following 13]|4]|28:|, we say there is a harmonic resonance of order {m, ^2) if h satisfies (4.38 1. At this depth, linearized 
standing waves of wave number k = 1 have a period exactly m times larger than standing waves of wave number k - k2. 
This nomenclature comes from the short-crested waves literature |751|76l; a more general framework can be imagined 
in which k\ is not assumed equal to 1 and m is allowed to be rational, but we do not need such generality. 

We remark that the nucleation event discussed in the previous section does not appear to be connected to a harmonic 
resonance. In that example, the fundamental mode must have a fairly large amplitude before the secondary wave 
becomes active, and the secondary wave is not a clean k - 21> mode. Also, no integer m causes the fluid depth of an 
(m, 23) resonance to be close to 2.045. The situation is simply that at a certain amplitude, the ^ = 1 standing wave 
excites a higher-frequency, smaller amplitude standing wave that oscillates at its surface. It is not possible to decrease 
both of their amplitudes to zero without destroying the resonant interaction in this case. 

We now restrict attention to the (3, 7) harmonic resonance. Setting m 



1 tanh 7/1 = 9 tanh h, h>0 



3 and /ta = 7 in ( |438| ) yields 
/jcrit ~ 1.0397189. 



(4.39) 



In the nonlinear problem, when the fluid depth has this critical value, we find that the k = 1 and k - I branches persist 
as if the other were not present. Indeed, the former can be computed as a family of ^ = 1 solutions on a fluid of depth 
7h. The latter can be computed by taking a pure k - 1 solution of the linearized problem as a starting guess and solving 
for the other Fourier modes of the initial conditions, as before. When this is done, after setting e = ^i(O), we find that 
ipiiO) = O(e^), ipsiO) - O(e^), ipjiQ) = O(e^), and 1^9(0) = Oie'). To obtain these numbers, we used 10 values of e 
between 10"^ and lO^^* and computed the slope of a log-log plot. The calculations were done in quadruple precision 
with a 32 digit estimate of h^it to avoid corruption by roundoff error If we repeat this procedure with h - \ .0, we find 
instead that 1^7(0) - Oie'), (f<)(Q) - 0{e^). Thus, the degeneracy of the bifurcation at h„{i appears to slow the decay 
rate of the 7th and higher modes, but not enough to affect the behavior at linear order. 

We were surprised to discover that two additional branches also bifurcate from the stationary solution when h - 
hail- For these branches, we find that <pk(0) - Oie^), where the first several values of p are 
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Figure 7: Perturbation of this degenerate bifurcation causes a pair of imperfect (h > /icrit) or perfect (/; < h^fn) secondary bifurcations to form. Red 
markers are the solutions actually computed, while blue markers correspond to the same solutions, phase-shifted in space by n. 



These numbers were computed as described above, with e ranging between 10 and 10"^. To get a clean integer for 
^2i(0), ^27(0) and ^29(0), we had to drop down to the range 10"^ < e < lO"'*. Using the Aitken-Neville algorithm 11771 
to extrapolate ^i(0)/e^ to e = 0, we obtain the leading coefficients for the two branches: 
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In summary, there are four families of solutions of the nonlinear problem that bifurcate from the stationary solution. 
In the small amplitude limit, they approach a pure k - I mode, a pure k = 7 mode, and two mixed modes involving 
both ^ = 1 and k - 7 wave numbers. For convenience, we will refer to these branches as "pure" and "mixed" based 
on their limiting behavior in the linearized regime. The mixed mode solutions are examples of the Wilton's ripple 
phenomenon fTOl 1711 173)1 in which multiple wavelengths are present in the leading order asymptotics. 

When these four branches are tracked in both directions, we end up with eight rays of solutions emanating from the 
equilibrium configuration, labeled a-h in Figure [7] Rays a and b consist of pure k - 1 mode solutions, with negative 
and positive amplitude, respectively, where amplitude refers to ^i(O). Rays e and h are the pure k - 7 mode solutions, 
and rays c,d,f,g are the mixed mode solutions. It is remarkable that rays a and f, as well as b and c, are globally 
connected to each other by a large loop in the bifurcation diagram. By contrast, for the Benjamin-Ono equation 17^ . 
additional branches of solutions that emanate from a degenerate bifurcation belong to different levels of the hierarchy 
of time-periodic solutions than the main branches; thus, solutions on the additional branches have a different number 
of phase parameters, and cannot meet up with one of the main branches without another bifurcation. 

We now investigate what happens to these rays when the fluid depth is perturbed. When h increases from /icrit 
to 1.04, rays e and h (the pure k - 7 solutions) break free from the other 6 rays. An imperfect bifurcation forms 
on rays a and b, linking the former to f and d, and the latter to c and g. Aside from this local reshuffling of branch 
connections near the stationary solution, the global bifurcation structure of h = 1.04 is similar to h = hait- In the other 
direction, when h - 1.03 < /lait, rays a and b (the pure k - 1 solutions) disconnect from the other rays. Instead of 
forming imperfect bifurcations as before, rays c and d separate from the ^7=0 axis, but remain connected to ray e 
through a perfect bifurcation. The same is true of rays f, g and h. Thus, we have identified a case where perturbing 
a degenerate bifurcation causes it to break up into a primary bifurcation and two secondary bifurcations |74|, either 
perfect (h < /icrit) or imperfect (h > /icrit)- 

The reason one is perfect and the other is not can be explained heuristically as follows. All the solutions on the 
pure k - 7 branch have Fourier modes (fk(t), with k not divisible by 7, exactly equal to zero. These modes can be 
eliminated from the nonlinear system of equations by reformulating the problem as a A; = 1 solution on a fluid of depth 
7h. This reformulation removes the resonant interaction by restricting the k - 1 mode (in the original formulation) 
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!z 2k n 2jT Tt 2jT tt 2ji jr 2n 

Figure 8: Solutions labeled F, G, H and K in Fig.^show the transition from standing waves that form crests at .v = ;r to those that form crests 
at ;i: = when f = T/4. This transition occurs where branch f meets branch g in the h < h^nt case. For h > hcdt, branch f meets branch d at an 
imperfect bifurcation, and the solution at the end of path d would resemble solution K, shifted in space by n. 



(^7(0) vs T f>j{0) vs T 




7.1 7.15 7.2 7.25 7.3 7.35 7.4 7.1238 7.12393 7.1 7.15 7.2 7.25 7.3 7.35 7.4 7.14202 7.14204 

Figure 9: Solutions O and P, the bifurcation points from the equilibrium state to the pure k = 1 and k = 7 standing waves, respectively, separate 
from each other through a change in period as fluid depth varies from h = /icii. The imperfect bifurcation can occur arbitrarily close to solution O 
by taking h \ h^^n. Turning points in 1^7(0) and T occur at solution H for /) < /jciit- 



to remain zero. The simplest way for this mode to become non-zero, i.e. deviate from rays e,h in Fig. [Tj is through 
a subharmonic bifurcation (with A: = 7 as the fundamental wavelength) in which the Jacobian 7 in (3.31 1 develops a 
non-trivial kernel containing a null vector c e P{N) with ci - ^i(O) 0. Here c contains the even modes of 77 and the 
odd modes of ^ at f - 0, as in (3.15 1, but with « = 00. If such a kernel exists, one expects to be able to perturb the 
solution in this direction, positively or negatively, to obtain a pitchfork bifurcation. By contrast, solutions on the k - I 
branch develop non-zero higher-frequency modes through non-hnear mode interactions. So while ci = on the k -1 
branch, C7 on the k - 1 branch. Since there is no way to control the influence of the 7th mode, e.g. by constraining 
it to be zero, there really is no "pure" k = 1 branch to bifurcate from, and the result is an imperfect bifurcation. 

The fact that ray f is connected to g for h < hem, and to d for h > h^rn, has a curious eff'ect on the form of the 
numerical solution at the end of the red branch, the branch of solutions actually computed, in Fig. [7] In the former 
case, ci changes sign from branch f to g, and we end up at solution K in Fig. |8] which forms a wave crest at x = 
at f - T/4. In the latter case, ci remains negative from branch f to d (or branch a to d if the imperfect bifurcation is 
traversed without branch jumping) and we end up at a solution similar to K, but phase shifted, so that a wave crest 
forms at X = TT at f = T/4-. Note that the sign of ci determines whether the fluid starts out flowing toward x = tt and 
away from x = 0, or vice-versa. 

The transition from wave crests at x = tt to wave crests at x = when f - T/4 is shown in Fig. [8] Solutions F, 
G and H may all be described as A = 7 standing waves superposed on A: = 1 standing waves. Note that solution F 
bulges upward at x - tt when t - T/4, while solution G bulges downward there. A striking feature of these plots is 
that the A = 7 modes of (p and 77 nearly vanish at f = T/12 and t - T/6, respectively. This occurs because the A = 7 
mode oscillates 3 times faster than the k - I mode. Solution H is a pure k - 7 solution, which means (/?(x, t) vanishes 
identically at f = (y), m > 0, while 777(f) passes through zero at f = ^ (0, m > 0. Since solutions F and G are 
close to solution H, 1^7(0 and 777(f) pass close to zero at these times, leading to smoother solutions dominated by the 
first Fourier mode at these times. 

Figure [9] shows how the period varies along each of these solution branches. The period varies with fluid depth 
more rapidly for solutions on the k = 1 branch than on the k = 7 branch since the slope of tanh h is 18000 times larger 
than that of tanh7/i when /z » 1. As a result, bifurcation point O (to the k = I branch) moves visibly when h changes 
from 1.03 to 1.04, while bifurcation point P (to the k - 7 branch) hardly moves at all. We also see that the period 
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Figure 10: Snapshots of several standing waves over a quarter period in water of infinite depth, along with bifurcation diagrams showing where 
they fit in. (center) Conventional bifurcation diagram showing wave height versus crest acceleration. The turning point at A was discovered by 
Mercer and Roberts [2] while the turning point at C and subsequent bifurcation structure were discovered by Wilkening (17]. The wave height h 
eventually exceeds the local maximum at A. (right) The continuation parameters actually used were ci, cj, cjo and T rather than ft or A^. 



increases with amplitude on branches e and h. By contrast, on branches a and b, it decreases to a local minimum 
before increasing with amplitude. This is consistent with the asymptotic analysis of Tadjbakhsh and Keller discussed 



previously; see (4.36 1 above. Finally, we note that both T and C7 have a turning point at solution H, the bifurcation 
point connecting branches f and g to branch h. This causes paths f and g to lie nearly on top of each other for much 
of the bifurcation diagram. Other examples of distinct bifurcation curves tracing back and forth over nearly the same 
paths are present (but difficult to discern) in Figures [7] and [9] when h = 1.03, and will be discussed further in Section[5] 

4.4. Breakdown of self- similarity and the Penney and Price conjecture 

In Sections [4. 2| and [431 above, we have seen that increasing the fluid depth causes disconnections in the bifurcation 
diagrams to "heal," and it is natural to ask if any will persist to the infinite depth limit. The answer turns out to 
be yes, which is not surprising from a theoretical point of view since infinite depth standing waves are completely 
resonant |20|, involving state transition operators with infinite dimensional kernels and a small-divisor problem on 
the complement of this kernel. Nevertheless, examples of such disconnecitons have only recently been observed 
in numerical simulations flTl, and show no evidence of being densely distributed along bifurcation curves. In this 
section, we expand on the results of 1 17 1, filling in essential details and providing new material not discussed there. 
The scarcity of observable disconnections will be discussed further in the conclusion section. 

The main question addressed in |17| is whether standing waves of extreme form approach a limiting wave profile 
with a geometric singularity at the wave crest when the bifurcation curve terminates. This type of question has a long 
history, starting with Stokes ll79l[80l . who predicted that the periodic traveling wave of greatest height would feature 
wave crests with sharp, 120° interior crest angles. While there are some surprises concerning oscillatory asymptotic 
behavior at the crest of the almost highest traveling wave ||8Tll82l[r7ll . it has been confirmed both theoretically [83] 
and numerically Il84l l85l that a limiting extreme traveling wave does exist, and possesses a sharp 120° wave crest. 
For standing waves, a similar conjecture was made by Penney and Price in 1952 1 13 1, who predicted that the limiting 
extreme wave would develop sharp, 90 degree interior crest angles each time the fluid comes to rest. As discussed 
in the introduction, numerous experimental, theoretical and numerical studies lfT4l [TSl [T6l |8l |29l |2] [T) have reached 
contradictory conclusions on the limiting behavior at the crests of extreme standing waves. 

Penney and Price expected wave height, defined as half the maximum crest-to-trough height, to increase mono- 
tonically from the zero-amplitude equilibrium wave to the extreme wave. Mercer and Roberts found that wave height 
reaches a turning point, achieving a local maximum of h - 0.62017 at Ac - 0.92631, where Ac is the downward 



acceleration of a fluid particle at the wave crest at the instant the fluid comes to rest (assuming g - 1 in (2.1 1). Since 



h is not a monotonic function, they proposed using crest acceleration as a bifurcation parameter instead. However, as 



shown in Fig. 10 crest acceleration also fails to be a monotonic function. The bifurcation curve that was supposed to 
terminate at the extreme wave when Ac reaches 1 becomes fragmented for 0.99 < Ac < 1 . Just as in the finite depth 
case, this fragmentation is due to resonant interactions between the large-scale carrier wave and various smaller-scale. 



secondary standing waves that appear at the surface of the primary wave. Figure 1 1 shows several examples of the 
oscillatory structures that are excited by resonance, both in space and in time. The amplitude of the higher-frequency 
oscillations are small enough in each case that the vertical position of a particle traveling from trough to crest increases 
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Figure 1 1 : Oscillatory structures near the wave crest, and time-evolution of a particle from trough to crest, for several extreme standing waves. 
Labels correspond to the bifurcation diagrams in Figure Qo] These solutions take the form of higher-frequency standing waves superimposed 
nonlinearly on lower-frequency earner waves. The higher-frequency oscillations occur both in space and time. 
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Figure 12: Evolution of surface height and its derivatives over a quarter period for solution O in Figure [TO] The plots at right of 77,-^ are shown only 
at time r/4 for clarity. In the far right panel, we also plotted solution E for comparison. Each solution that terminates a branch in the bifurcation 
diagram is highly oscillatory; we followed the branches to the point that the computations became too expensive to continue further 



monotonically in time; however, plotting differences of solutions on nearby branches as a function of time reveals the 
temporal behavior of the secondary standing waves. We note that the frequency of oscillation of the secondary waves 
(near x - n) decreases as t approaches r/4. This seems reasonable as fluid particles at the crest are nearly in free-fall 
at this time when Ac is close to 1 ; thus, the driving force of the secondary oscillations is low there. In shallow water, 
the secondary oscillations are often strong enough to lead to non-monotonic particle trajectories from trough to crest 
(see Section |43] l. 

If a limiting wave profile does not materialize as Ac approaches 1, a natural question arises as to what will terminate 
the bifurcation curves. In IfTTl . it was emphasized that oscillations at the crest tip prevent self-similar sharpening to 
a comer, as happens in the traveling case. We note here that the entire wave profile, not just the crest tip, develops 
high-frequency oscillations on small scales toward the end of each bifurcation curve. This is illustrated in Fig.[T2j and 
suggests that if these bifurcation curves do end somewhere, without looping back to merge with another disconnection, 
it may be due to solutions becoming increasingly rough, with some Sobolev norm diverging in the limit. We also 
remark that since many of these standing waves come close to forming a 90° corner, there may well exist nearly time- 
periodic solutions that do pass through a singular state. Taylor's thought experiment lfT4l in which water is piled up 
in a crested configuration and released from rest could be applied to a sharply crested perturbation of the rest state of 
one of our standing waves. However, like Taylor, we see no reason that 90° would be the only allowable crest angle. 
It is conceivable that 90° is the only angle for which smooth solutions can propagate forward from a singular initial 
condition, but we know of no such results. 

The increase in roughness of the solutions as crest acceleration approaches 1 may also be observed by plotting 



Fourier mode amplitudes for various solutions along the bifurcation curve. In Fig. 13 we compare the Fourier spectrum 
of 77 at f = and t - T/4 for solutions O and A. In both of these simulations, we parametrized the curve non-uniformly, 
as in ( 2.10| l, to increase resolution near the crest tip. Thus, a distinction must be made between computing Fourier 



modes with respect to x versus a 

%(f)=^j| 7]ix,t)e-""'dx, or fjdO^^j^ ii{^n,){a),t)e-''" da. (4.41) 
In this figure, we use the latter convention, since rj o ^1 and (f o are the quantities actually evolved in time, and the 



decay rate of Fourier modes is faster with respect to a. Ait - 0, the two formulas in (4.41 1 agree since we require 



19 



Fourier modes fikit) for solution O Fourier modes f]k{t) for solution A 
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Figure 13: The Fourier modes of f; (shown) and ifi (not shown) are monitored to decide how many grid points are needed to resolve the solution. The 
parameter p; controls the nonuniform spacing of gridpoints via j2.10^ . The real (black) and imaginary (grey) parts of fjkit) are plotted in positions 
2k and 2k + i, respectively. (Left) the minimization was performed in double precision to obtain these initial conditions. The result was checked in 
quadruple precision to eliminate roundoff error (Right) the minimization was performed in quadruple precision, yielding / = 2. 1 X 10"'". 



^i(a) - a. For solution O, the Fourier modes of the initial conditions decay to |q| < 10 for 2k > 3000. At f = T/4 
we have max(|?7/.|, \ipi,\) < 10"'^ for 2k > 6500 with py - 0.09. For solution A, we have |q| < 10"^'' for 2k > 400 and 
max(|%|, \(pk\) < IQ-^'^ for 2k > 800 at t = T/4 with p„ = 0.4. Here T = 1.629324 for solution O and T = 1.634989 
for solution A. The mesh parameters used in these simulations are listed in Table[T] We remark that for solutions such 
as O with fairly sharp wave crests, decreasing p,, generally leads to faster decay of Fourier modes, but also amplifies 
roundoff errors due to closer grid spacing near the crest. Further decrease of py in the double-precision calculation 
does more harm than good. 
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Table 1: Mesh parameters used to compute solutions A and O in Figures [13] and[T4] 



The eff'ect of roundoff-error and the 36th order filter can both be seen in the second panel of Fig. 13 In exact arith- 
metic, rjix, t) and <f{x, f) would remain even functions for all time. However, in numerical simulations, the imaginary 
parts of fjkit) and (pkif) drift away from zero, giving a useful indicator of how much the solution has been corrupted by 



roundoff error. The filter ( 2. 14 1 has little eff'ect on the first 70 percent of the Fourier modes, but strongly damps out the 
last 15 percent. By monitoring the decay of Fourier modes through plots like this, one can ensure that the simulations 
are fully resolved, and that filtering does not introduce more error than is already introduced by roundoff" error We 
also monitor energy conservation, 

E(t) = - J^^ [ip(x, t)Qipix, t) + gj](x, tf] dx, 

choosing time-steps small enough that E remains constant to as many digits as possible, typically 14 in double- 
precision and 29 in quadruple precision. Note that Q depends on time through rj. 

Because solution A remains smoother and involves many fewer Fourier modes than solution O, it was possible 



for us to perform the entire computation in quadruple precision arithmetic. This allowed us to reduce / in (3.17 1 
to 2.1 X 10"^*^. As shown in Fig. 14 the velocity potential of this solution drops from (9(1) at f = to less than 



3 X 10"^^ at f = r/4, in the uniform norm. While it was not possible to perform the minimization for solution O 
in quadruple precision arithmetic (due to memory limitations of the GPU device), we were able to check the double- 
precision result in quadruple precision to confirm that / is not under-predicted by the minimization procedure. Because 
the DOPRI8 and SDC15 methods involve 12 and 99 internal Runge-Kutta stages per time-step, respectively, more 
function evaluations were involved in advancing the quadruple-precision calculations through time even though A^; is 



larger in the double-precision calculations. As shown in Fig. 14 the oscillations in (f{x, T/4-) remain fully resolved in 
the quadruple precision calculation; thus, / = 8.6 x 10"^^ is an accurate measure of the squared error The predicted 
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Figure 14: Plots of the residual ifi(.x, T/4) for solutions O and A. (left) When minimized in double-precision arithmetic, we obtain / = 1.3 X 10" . 
Shown here is a re-computation of the solution in quadruple precision on a finer mesh using the same initial conditions. This yields / = 8.6 X 10"^', 
which is even smaller than predicted in double-precision. Re-spacing the grid maps the curve Ey(a) to Ey(^y^(x)), improving the resolution that 
can be achieved with 9216 gridpoints. Note that the oscillations in ifi(x, T/4) are fully resolved, (right) The minimization was peii'ormed in 
quadruple-precision arithmetic, yielding / = 2.1 X 10"'". The velocity potential is nearly 30 orders of magnitude smaller at f = T/4 than at / = 0. 



value of / in double-precision (obtained by minimizing /) was / = 1.3 x 10"^^. Since minimizing / entails eliminating 
as many significant digits of ip{x, T/4) as possible, the resulting value of / is not expected to be highly accurate. What 
is important is that minimizing / in double-precision does not grossly underestimate its minimum value. In fact, 
its value is often over-estimated, as occurred here. This robustness is a major benefit of posing the problem as an 
overdetermined non-linear system. The only way to achieve a small value of / is to accurately track a solution of the 



PDE for which the exact / is small. Roundoff errors and truncation errors will cause the components of r in (3.17 1 
to drift away from zero, leading to an incompatible system of equations with minimum residual of the order of the 
accumulated errors. 

There is a big advantage to choosing f = to occur at the midpoint between rest states rather than at a rest state. 
The reason is that many more Fourier modes are needed to represent rj when the wave crest is relatively sharp, which 



for us occurs when t is near r/4. For example, solutions O and A in Fig. 13 have more than twice as many active 
Fourier modes at f = r/4 as they did at f = 0, even using a non-uniform grid to better resolve the crested region. 
Setting up the problem this way leads to fewer Fourier modes of the initial condition to solve for, and increases the 
number of non-linear equations. Thus, the system is more overdetermined, adding robustness to the computation. 

We conclude this section by mentioning that "branch jumping" is very easy to accomplish (and hard to avoid) 
in the numerical continuation algorithm. For strong disconnections such as at solution B in Fig. |2j it is sometimes 
necessary to backtrack away from the disconnection and then take a big step, hoping to land beyond the gap. Since we 
measure the residual error in an overdetermined fashion, it is obvious if we land in a gap where there is no time-periodic 
solution — the minimum value of / remains large in that case. However, most disconnections can be traversed without 



backtracking, or even knowing in advance of their presence. The disconnections in Figure 10 were all discovered by 
accident in this way. Once a disconnection is observed, we can go back and foUow side branches to look for global 
re-connections or new families of time-periodic solutions. 

4.5. Counter-propagating solitary waves in shallow water 

In previous sections, we saw that decreasing the fluid depth causes nucleation of loop structures in the bifurcation 
curves that nearly (or actually) meet at imperfect (or perfect) bifurcations. Some of these disconnections persist in the 
infinite depth limit. We now consider the other extreme of standing waves in very shallow water. 



In Figure 15 we track the k - I family of standing waves out of the linear regime for water of depth h - 0.05 



and spatial period In. The period of the solutions in the linear regime is T = 2nl Vtanh0.05 = 28.1110, compared to 
T - 7.19976 when h - \ and T - In - 6.28319 when h - oo. Thus, the waves travel much slower in shallow water. 
We also see that T decreases with amplitude as the waves leave the linear regime, consistent with Tadjbakhsh and 
Keller's result. Equation ( |4.36| l above, that the sign of the quadratic correction term in angular frequency is positive 
for h < \ .058 1 . Many more disconnections have appeared in the bifurcation diagrams at this depth than were observed 
in the cases /z » 1 .0 and h - oo considered above. It was not possible to track all the side branches that have emerged 
to see if they reconnect with each other However, each time we detected that the minimization algorithm had jumped 
from one branch to another, we did backtrack to fill in enough points to observe which modes were excited by the 
resonance. In general, higher-frequency Fourier modes of the initial condition possess more disconnections, even 



though all the bifurcation curves describe the same family of solutions. For example, in Fig. 15 we see that 1^17(0) 
has much stronger disconnections than ^i(O), and those of ?736(0) are stronger still. This suggests that high-frequency 
resonances have little effect on the dynamics of lower-frequency modes. Nevertheless, there is some effect, since even 
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Figure 15: Bifurcation diagrams sliowing tlie dependence of i^i(O), 1^17(0), and /)36(0) on T for a family of standing water waves in shallow 
(/i = 0.05) water. Many more disconnections are visible at this depth than were observed in the h = 1.0 and h = 00 cases above. 
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Figure 16: Standing waves in shallow water take the form of counter-propagating solitary waves that interact elastically. The low-amplitude 
radiation normally associated with inelastic collisions is already present before the interaction, and does not increase as a result of the interaction. 
In solutions A and B, this radiation consists of small-amplitude, high-frequency standing waves over which the solitary waves travel. In solution C, 
this radiation is a chaotic mix of standing and counter-propagating traveling waves of different wave numbers. 



^i(O) exhibits visible disconnections, and in fact has some gaps in T where solutions could not be found. We interpret 
these gaps as numerical manifestations of the Cantor-like structures that arise in analytical studies of standing water 
waves due to small divisors |24, 20|. This will be discussed further in the conclusion section. 



In Figure 16 we show time-elapsed snapshots of the standing wave solutions labeled A-C in Figure 15 These 
standing waves no longer lead to large scale sloshing modes in which the fluid rushes from center to sides and back 
in bulk. Instead, a pair of counter-propagating solitary waves travel back and forth across the domain, alternately 
colUding at X - n and x = at times t = T/4 + (r/2)Z. In the unit depth case above, we observed in Figure [3] that 
disconnections in the bifurcation curve correspond to secondary standing waves appearing with one of two phases at 
the surface of a primary carrier wave. The same is true of these solitary wave interactions. While it is difficult to 



observe in a static image, movies of solutions A and B in Figure 16 reveal that the primary solitary waves travel over 
smaller standing waves with higher wave number and angular frequency. As a result, a fluid particle at x - n will 
oscillate up and down with the secondary standing wave until the solitary waves collide, pushing the particle upward 



a great distance. By contrast, in the infinte-depth case, we saw in Figure 1 1 that rjin, t) increases monotonically from 
trough to crest in spite of the secondary waves. Each time a disconnection in Figure 15 is crossed, the background 
standing wave (or some of its component waves) change phase by 180°. Solution B is positioned near the center of a 
bifurcation branch, far from major disconnections in the bifurcation curves. As a result, the water surface over which 
the solitary waves travel remains particularly calm for solution B. By contrast the background waves of solution C 
are quite large in amplitude, with many active wave numbers. A Floquet stability analysis, presented elsewhere Ii42l . 
shows that solutions A and B are linearly stable to harmonic perturbations while solution C is unstable. 

4.6. Gravity-capillary standing waves 

In this section, we consider the eff'ect of surface tension on the dynamics of standing water waves. We restrict 
attention to waves of the type considered by Concus ||231 and Vanden-Broeck |70|, leaving collisions of gravity- 
capillary waves Ii53l for future work [42 1. The only change in the linearized equations ( |4.34| i when surface tension is 
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Figure 17: Bifurcation diagrams showing the dependence of various Fourier modes of the initial conditions on the period for standing water waves 
with surface tension. The bifurcation curve splits into several disjoint branches between solutions B and C as resonant waves appear on the fluid 
surface. The curve labeled 'quadratic correction' is given in Equation ^4.43^ . 
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Figure 18: Time-elapsed snapshots of four standing waves over a quarter-period. At f = 0, a pair of counter-propagating depression waves move 
away from each other as the fluid flows to the center. Solutions B, C and D exhibit higher-frequency standing waves oscillating on the surface of 
the low-frequency carrier wave. All of the solutions reach a rest state where if = Oatt = T/4. 



included is that (p, - P[- gi] + (cr / p)fi^jc]. The standing wave solutions of the linearized problem continue to have the 
form ( |4.35| ), but with 

(4.42) 



-\g + —k^\ktanhkh. 



A/B = ^ktimhkhl[g + {a-lp)k^ 



We choose length and time-scales so that ^ = 1 and cr/p - 1. For simplicity, we consider only the ^ = 1 bifurcation in 
the infinite depth case, and continue to assume all functions are 27r-periodic in space. In this configuration, the period 
of the linearized standing waves is T = Inj V2 ~ 4.443. For real water (assuming a -12 dyne/cm), 4.443 units of 
dimensionless time corresponds to 0.0739 seconds, and 2n spatial units corresponds to 1 .7 cm. 

^i(O), increases in magnitude beyond 
oo. Quantitatively, our 



The results are summarized in Figure 17 As the bifurcation parameter, ci 
the realm of linear theory, the period increases, just as in the zero surface tension case for h 
results agree with Concus' prediction |23 1 that 



ci =^i(0) 



(4.43) 



in the infinite depth case when the surface tension parameter 5 :— crk /(crk + pg) is equal to 1/2. Equation (4.43 i is 



plotted in the left panel of Figure 17 for comparison. Shortly after solution B (cj = -0.464, T - 5.10), a complicated 



sequence of imperfect bifurcations occurs in which several disjoint families of solutions pass near each other Com- 



parison of solutions B, C and D in Figure 18 suggests that these disconnections are due to the excitation of different 



patterns of smaller-scale capillary waves oscillating on the free surface. An interesting diff'erence between these stand- 
ing waves and their zero surface-tension counterparts (e.g. in Figure |2]l is that the "solitary" waves that appear in the 
transition periods between rest states of maximum amplitude are inverted. Thus, we can think of these solutions as 
counter-propagating depression waves Il86ll87ll that are tuned to be time-periodic, just as the zero surface-tension case 
leads to counter-propagating Stokes waves. The depression waves travel outward as fluid flows to the center, whereas 



the Stokes waves travel inward, carrying the fluid with them. Figure 19 shows snapshots of particle trajectories for 
solution C, color coded by pressure. The methodology for computing this pressure is given at the end of Appendix [A] 
Negative pressure (relative to the ambient air pressure po in (2.4 1, which is set to zero for convenience) arises beneath 



the depression waves as they pass, which leads to larger pressure gradients, faster wave speeds, and shorter periods 
than were seen in previous sections. 
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Figure 19: A more detailed view of solution C from Figure [T8] at times f = and / = r/4 showing regions of negative pressure beneath depression 
waves. These images are taken from movies in which passively advected particles have been added to the fluid for visualization, color coded by 
pressure using ^A.12^. A secondary standing wave leads to visible variations in curvature and pressure at time r/4. 



4.7. Performance comparison 

We conclude our results with a comparison of running times for the various algorithms and machines used to gen- 
erate the data reported above. Our machines consist of a laptop, a desktop, a server, a GPU device, and a supercluster. 
The laptop is a Macbook Pro, 2.53 GHz Intel Core 15 machine. The desktop is a Mac Pro with two quad-core 2.8 
GHz Intel Nehalem processors. The rackmount server has two six-core 3.33 GHz Intel Westmere processors and an 
NVidia M2050 GPU, and is running Ubuntu Linux. The cluster is the Lawrencium cluster (LRl) at Lawrence Berkeley 
National Laboratory. Each node of the cluster contains two quad-core 2.66 GHz Intel Harpertown processors. Intel's 
math kernel library and scalapack library were used for the linear algebra on Lawrencium. 

Our first test consists of computing the first 120 deep-water standing wave solutions reported in Figure 10 (up 
through solution B). The running times increase with amplitude due to an increase in the number of gridpoints (M), 
timesteps (A^), and unknown Fourier modes of the initial conditions (n). The parameters used in this test are given in 
Table [2] with running times reported in the left panel of Figure 20 For each index range, we computed the average 
time required to reduce / below 10"^^ (or 10"^" in quadruple precision), using linear extrapolation from the previous 
two solutions as a starting guess. In the Adjoint Continuation Method, the first solution in each range (with index 101, 
150, 174, etc.) takes much longer than subsequent minimizations. This is because we re-build the inverse Hessian 
from scratch when the problem size changes, but not from one solution to the next in a given index range. This is 
illustrated in the figure by plotting the maximum and median number of seconds required to find a solution in a given 
index range, along with the average. 

For these smaller problems, the DOPR15 and DOPRI8 schemes are of comparable efficiency for double-precision 
accuracy. We used the former for this particular test. In quadruple precision, we switched to the SDC15 scheme, which 
is more efficient than DOPR15 and DOPR18 in reducing / below 10"^*^. We also doubled M and n in the quadruple- 
precision runs. The MINPACK benchmark results were optimized as much as possible (using the GPU with Error 
Correcting Code (ECC) turned off) to give as fair a comparison as possible. For the benchmark, the Jacobian was 
computed via forward differences, as in (3.33 i. The ACM method works well on small problems, but starts to slow 
down relative to the benchmark around M - 1024. The trust region shooting method is much faster than the ACM (and 
the benchmark) due to the fact that all the columns of the Jacobian employ the same Dirichlet to Neumann operator 
at each timestep. Thus, we save a factor of n in setup costs by computing n columns of the Jacobian simultaneously. 
Moreover, most of the work can be organized to run at level 3 BLAS speed. Note that the GPU is slower than the 
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Table 2: Parameters used in the performance comparison for small problems. Here "start" and "end" give the values of the bifurcation parameter 
(bif par) at the endpoints of the corresponding segment of the bifurcation curve. The solutions at these endpoints (with index 150, 174, 184, etc.) 
are computed twice, once on the coarse mesh and once on the fine mesh. Tstan and Tend are the periods at the endpoints. Wquad is the number of 
timesteps (of the SDC scheme) used in the quadruple precision calculations. Mq„ad and nq„nd were set equal to 2M and 2«, respectively. 
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Figure 20: Performance of the algorithms on various architectures, (left) Each data point is the average running time (in seconds per solution) for 
solutions in each index range listed in Table |2] For the Adjoint Continuation Method (ACM), which takes much longer for the first solution than 
subsequent solutions due to re-use of the Hessian information, we also report the longest running time and the median running time, (right) Time 
taken in each segment of the mesh-refinement strategy to evolve solution O in Figure [Tl] through l/60th of a quarter-period. The parameters for 
each segment are given in Table[3] The times listed for the Jacobian are the cost of evolving all 1200 columns thi'ough time 7/240. 
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Table 3: Pai'ameters used in the performance comparison for a large problem (solution O of Figure [T2) . Here d\ is the number of timesteps to 
advance the solution through one sixtieth of a quarter period (7/240). The total number of timesteps on segment / is A'; = 6O0;£/; in this case, while 
the number of function evaluations is 12^/ for DOPRI8 and 99^/ for SDC15. 



multi-core CPU in double-precision for small problems, but eventually wins out as the opportunity for parallelism 
increases. In quadruple-precision, the GPU is substantially faster than the CPU for all problem sizes tested as there is 
more arithmetic to be done relative to communication costs. 



Our second test consists of timing each phase of the computation of solution O in Figure 12 The parameters used 
for the performance comparison are given in Table [3] For the double-precision calculation, we later refined the mesh 
to the values listed in Table [T] in Section |4.4[ h owever, this was done on one machine only. (The value of / here 
is 3.9 X 10"^^ versus 1.3 x 10""^ in Section |4!4l ) The quantities di in Table |3] are the number of timesteps between 
mile-markers where the energy and plots of the solution were recorded. In this case, we recorded 60 slices of the 
solution between f = and t - r/4. The running times in the right panel of Figure 20 report the time to advance 
from one mile-marker to the next. It was not possible to solve this problem via the ACM method or MINPACK, so 
this test compares running times of the trust region method on several machines. In quadruple precision, we evolved 
the solution but did not compute the Jacobian. Two of the jobs on the cluster (1 node and 2 nodes) were terminated 
early due to insufficient available wall-clock time. When using the GPU, there is little improvement in performance 
in also running openMP on the CPU. For example, switching from 12 threads (shown in the figure) to one thread (not 
shown) slows the computation of the Jacobian by about 10 percent, but speeds up the computation of the solution by 
about 1 percent. When evolving the solution on a large problem, the GPU is fully utilized; however, when evolving 
the Jacobian, the GPU is idle about 60 percent of the time. Thus, we can run 2-3 jobs simultaneously to improve the 
effective performance of the GPU by another factor of 2 over what is plotted in the figures. This is also true of the 
Lawrencium cluster — while using more nodes to solve a single problem stops paying off around 8 nodes, we can run 
multiple jobs independently. Most of the large-amplitude solutions in Figure 10 were computed in this way on the 
Lawrencium cluster, before we acquired the GPU device. 



5. Conclusion 

We have shown how to compute time-periodic solutions of the free-surface Euler equations with improved res- 
olution, accuracy and robustness by formulating the shooting method as an overdetermined nonlinear least squares 
problem and exploiting parallelism in the Jacobian calculation. This made it possible to resolve a long-standing open 
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question, posed by Penney and Price in 1952, on whether the most extreme standing wave develops wave crests with 
sharp 90 degree corners each time the fluid comes to rest. Previous numerical studies reached different conclusions 
about the form of the limiting wave, but none were able to resolve the fine-scale oscillations that develop due to 
resonant effects. While we cannot say for certain that no standing wave exists that forms sharp corners at periodic 
time-intervals, we can say that such a wave does not lie at the end of a family of increasingly sharp standing waves 
parametrized by crest acceleration, Ac- Indeed, crest acceleration is not a monotonic function, and the bifurcation curve 
becomes fragmented as ^ 1, with different branches corresponding to different fine-scale oscillation patterns that 
emerge at the surface of the wave. Following any of these branches in either direction leads to increasingly oscillatory 
solutions with curvature that appears to blow up throughout the interval [0, 2n], not just at the crest tip. 

Small-amplitude standing waves have been proved to exist in finite depth by Plotnikov and Toland |24|, and in 
infinite depth by Plotnikov, Toland and looss |20|. However, the proofs rely on a Nash-Moser iteration that only 
guarantees existence for values of the amplitude in a totally disconnected Cantor set ll25ll26l . In shallow water, with 



h - 0.05, we do see evidence that solutions do not come in smooth families. For example, in Figure 15 the number of 
visible disconnections in the bifurcation diagrams increases dramatically from i^i(O) to 1^17(0) to 7736(0). There are also 
a few gaps along the T-axis where the numerical method failed to find a solution, i.e. the minimum value of / did not 
decrease below the target of lO"^*" regardless of how many additional Fourier modes were included in the simulation. 
It is easy to imagine that removing all the gaps that arise in this fashion as the mesh is refined and the numerical 
precision is increased could lead to a Cantor-like set of allowed periods. 

Our numerical method measures success by how small the objective function / and residual r become. It will 
succeed if it can find initial conditions that are close enough to those of an exactly time-periodic solution, or at least of 
a solution that is time-periodic up to roundoff error tolerances. For the residual to be small, the bifurcation parameter 
must nearly belong to the Cantor set of allowed values, but membership need not be exact. If the Cantor set is fat 
enough (i.e. has nearly full measure), then most values of the bifurcation parameter will be close to some element of 
the set — roundoff error fills in the smallest gaps. While it is possible that our numerical method would report a false 
positive, this seems unlikely. The residuals of our solutions are not under-predicted by the minimization algorithm 



due to formulation of the problem as an overdetermined system. Indeed, we saw in Figure 14 that / decreases from 
1.3 X 10"^^ to 8.6 X 10"^^ for solution O when the initial conditions are evolved on a finer mesh in quadruple precision, 
and decreases from 1.9 x lO""** to 2.1 x 10"^" for solution A when the minimization is repeated in quadruple precision. 
This latter test is particularly convincing that the method is converging to an exactly time-periodic solution. 

If standing waves on water of infinite depth do not come in smooth families, as suggested by the analysis of EOl . 
they are remarkably well approximated by them. Prior to our work, no numerical evidence of disconnections in the 
bifurcation curves had been observed. Wilkening |17| found several disconnections for values of crest acceleration 
Ac > 0.99, but none at smaller values. As shown in Figure|2T| there is one additional disconnection around Ac - 0.947 
that can be observed in double -precision that was missed in Iil7j . However, the points at which resonance is supposed 
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Figure 21: Study of resonance in deep water standing waves, (left) Switching from double- to quadruple-precision arithmetic reveals only one 
additional disconnection in the bifurcation curves. The inset graph shows how the disconnections of Figure [lO] look when C47 is plotted rather than 
ci, C5 and C60- Solutions B and D are the points where IC47/C1I a 3 X 10"'*, just barely above the roundoff threshold. The gap in crest acceleration 
between these solutions is AdD) - Ac(B) = 1.4 X 10"'; thus, it is extremely unlikely in a parameter study that one would land in this gap. Outside 
of this gap, resonant effects from this disconnection are smaller than the roundoff threshold, (center and right) Resonance causes bursts of growth 
in the Fourier spectrum, but the modes continue to decay exponentially in the long run. 
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Figure 22: A closer look at the bifurcation structure in Figure^in the h = 1.03 case reveals a number of additional side-branches that trace back 
and forth over nearly the same curves when low-frequency modes are plotted (left), but become well-separated when high frequency modes ai'e 
plotted (right center, right). A small gap near T = 7.175 has formed on one of the wings in the plots of (^27(0) and ^69(0) vs T. 



to cause difficulty are expected to be dense over the whole range < < 1. We re-computed the solutions up to 
Ac = 0.8907 in quadruple precision, expecting several new disconnections to emerge in high-frequency Fourier modes. 
Surprisingly, we could only find one, at Ac = 0.658621. Using a bisection algorithm to zoom in on the disconnection 
in the 47th Fourier mode from both sides (using C5 as the bifurcation parameter), we were able to extend the side 
branches from C47 x +10"^^ to C47 ^ +10"'^. These side branches become observable in double-precision at points 



B and D in Figure |21j However, the gap in crest acceleration between solutions B and D is only 1.4 x 10 units 
wide. Thus, it is extremely unlikely that this resonance could be detected in double-precision without knowing where 
to look. Presumably the same issue prevents us from seeing additional disconnections in quadruple-precision. This 
suggests that the Cantor-like set of allowed values of the amplitude parameter is very fat, with gaps decaying to zero 
rapidly with the wave number of the resonant mode. 

In finite depth, with /z « 1, a connection can be seen between resonance and non-uniqueness of solutions. The 
main difference from the h = 0.05 and h - 00 cases is that for h ^ I, the disconnections lead to side-branches that can 
be tracked a great distance via numerical continuation, and are often found to be globally connected to one another. 
Traversing these side branches causes high-frequency modes to sweep out small-amplitude loop-shaped structures. 
These loops are "long and thin" in the sense that low-frequency modes trace back over the previously swept out 
bifurcation curves while traversing the loop, with little deviation in the lateral direction. For example, in Figure |22j 
the 27th Fourier mode executes a number of excursions in which it grows to around 10"*", causing the period and first 
Fourier modes to sweep back and forth over much larger ranges, 7.167 < T < 7.229 and -0.200 > (fi{0) > -0.262. 
These loops are plotted in Figures |7] and |9] as well, but the curves are indistinguishable from one another at this 
resolution since the lateral deviations are so small. Looking at the third panel of Figure 22 one might ask, "how 
many solutions are there with period T = 7.2." If we had not noticed any of the disconnections (note the exponential 
scaling of the axis), we would have answered 1. If we had only tracked the outer wings, we would have answered 3. 
Having tracked all the branches shown, the answer appears to be 5. But of course there are probably infinitely many 
disconnections in higher-frequency Fourier modes that we did not resolve or track, and some of these may lead to 
additional solutions with T - 1.2. Physically, all these crossings of T = 1.2 correspond to a hierarchy of "standing 
waves on standing waves," with different mode amplitudes and phases working together to create a globally time- 
periodic solution with this period. The fact that the low-frequency bifurcation curves sweep back and forth over nearly 
the same graph reinforces the physically reasonable idea that high-frequency, low amplitude waves oscillating on the 
surface of low-frequency, large amplitude waves will not significantly change the large-scale behavior 

In summary, time-periodic water waves occur in abundance in numerical simulations, and appear to be highly non- 
unique, partly due to the Wilton's ripple phenomenon of mixed-mode solutions co-existing with pure-mode solutions 
near a degenerate bifurcation, and also due to a tendency of the bifurcation curves to fold back on themselves each 
time a resonant mode is excited. Proofs of existence based on Nash-Moser iteration must somehow select among these 
multiple solutions, and it would be interesting to know whether the Cantor-like structure in the analysis is caused by 
a true lack of existence for parameter values outside of this set, or is partly caused by non-uniqueness. Finally, we 
note that most of the disconnections in the numerically computed bifurcation curves disappear in the infinite depth 
limit, and remarkably small residuals can be achieved with smooth families of approximate solutions. This calls for 
further investigation of the extent to which the obstacles to proving smooth dependence of solutions on amplitude can 
be overcome or quantified. 
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A. Boundary integral formulation 



While many numerical methods exist to evolve iiTotational flow problems 1431 l44l |451 l2l IMl l49l [37l I5l1 l53l 
we have found that a direct boundary integral implementation of is the simplest and most effective approach 
for problems where 77 remains single valued, i.e. the interface does not overturn. Suppressing t in the notation, we 
represent the complex velocity potential <l>(z) = (p(z) + it//(z) as a Cauchy integral Ii89l 



1 



11(a) da. 



aa) = ^(a) + irj(^{a)). 



(A.l) 



2ni ^(a) - z 

where z is a field point in the fluid, ^{a) is the (real-valued) dipole density, ((a) parametrizes the free surface, PV 
indicates a principal value integral, r]{x) retains its meaning from equation ( 2. 1 1, and the change of variables x - ^(a) 
will be used to smoothly refine the mesh in regions of high curvature. The minus sign in ( A. 1 1 accounts for the fact that 
Cauchy integrals are usually parametrized counter-clockwise, but we have parametrized the curve so the fluid region 
lies to the right of ^(a). When the fluid depth is finite, we impose the tangential flow condition using an identical 
double-layer potential on the mirror image surface, ^(a). This assumes we have set /z = in (2.3 1, absorbing the mean 
fluid depth into r] itself. We also use 4 cot | = PV Yjk Tj4-T to sum (A.l 1 over periodic images. The result is 
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z+2nk 



— cot 



fi(a) da. 



(A.l) 



Note that (J is real-valued on the x-axis, indicating that the stream function i// is zero (and therefore constant) along 
the bottom boundary. 

As z approaches ((a) from above (-1-) or below (-), the Plemelj formula f|89l gives 
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We regularize the principal value integral by subtracting and adding 5 cot from the first term in brackets 
l50ll . The result is 
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[K^{a,p) + K2{a,|3)]^i(J3)d|3, 



where Hf{a) = ^PV £^ ^da^ ^P^fo" ^ ^ot (^) df3 is the Hilbert transform and 
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Ki{a,B) = — — cot cot — — , 
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~ C(J3) J{a)-m 
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(A.5) 



We note that Ki(a,/]) is continuous at = a if we define Ki(a, a) - -q'(a)/[2('(a)]. Taking the real part of ( A.4i at 
z - ^(a)" yields a second-kind Fredholm integral equation for fi{a) in terms of ipi^a)). 
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(A.6) 



where Kj{a,/3) - lm{Kj(a,/3)}. Once iu(a) is known, it follows from (A. 2 1 that 



0'(z) = u(z) - iv(z) 
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where yia) - fi'(a) is the (normalized) vortex sheet strength. As z approaches ((a) from above or below, one may 
show IIID that 



1 PV 
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(A.8) 



Note that ^' is evaluated at fi in (A. 3 i and at o- in (A.8 1 inside the integral. We regularize the principal value integral 
using the same technique as before to obtain 
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(A.9) 



where 

G2{a,/3) = cot . (A. 10) 

G\{a,p) is continuous at/? = ff if we define G\{a,a) - ("{a)l\2^'{a)\. We could read u - and v = 0,, 
from (A.9 1 for use in the right hand side of (2.1 1. Instead, as an intermediate step, we compute the output of the 
Dirichlet-Neumann operator defined in \2.6\, 



\^'{a)\0ip{qa)) = \C\a)\^{aa)) = lim {a)[u{z) - iv{z)]} 

On z-^i(a)- 
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[G,{a,li) + G2{a,l3)]y(J3)dl3. 



Here Gj(a,/5) = Re{Gj(a,/3)} and iq{a)/\q(a)\ represents the normal vector to the curve. Note that the dot product 
of two complex numbers z and w (thought of as vectors in M?) is Re{zw). Once @ip{x) is known, we can evaluate the 
right hand side of \2.\\ using (2.9 1. 

For visualization, it is often useful to evaluate the velocity and pressure inside the fluid. The velocity was already 
given in terms of the vortex sheet strength in (A. 7 1 above. For pressure, we use the unsteady Bernoulli equation 



*r+ i|V0|'+^y + ^ = c(f), 

2 p 



(A. 12) 



where c{t) was given in (2.4i. One option for computing is to diff'erentiate (A.6l with respect to time to obtain an 
integral equation for ji, (see |92|), then express 4>t in terms of ju, by differentiating (A.2i. A simpler approach is to 
differentiate the Laplace equation ( 2.3 i with respect to time. The value of (p, on the upper boundary is ip, - (pyTj,, which 
is known. Since the real part of (A. 2 1 gives the solution 0(z) of Laplace's equation with boundary condition (p on the 
upper surface, we can replace (p with tp, - (pyj], in (A. 6 1 to convert (A. 2 1 into a formula for <p,{z) instead. 



B. Linearized and adjoint equations for the water wave 



In this section we derive explicit formulas for the variational and adjoint equations of Sections 3.2 and 3.3 A dot 
will be used to denote a directional derivative with respect to the initial conditions. The equation qt - DF(q)q of 
( 3 .231 1 is simply 



T](X, 0) = 77o(x), 



rit + r].y4>.y + T],q 



-{r}x<Px4>y 



1 



+11. 

cr 



-gv- 



d. 



ip{x,0) = (poix), 

(pxx + 4>yy - 0, 

i,,77 = 0,, + (f>yyf], 

ix 

(1+772)3/2 



0, 


(B.la) 


y <ri^ 


(B.lb) 


-h. 


(B.lc) 




(B.ld) 


J], 


(B.le) 


J]- 


(B.lf) 



29 



Note that evaluation of ^{x, y, t) on the free surface gives {ip{x) - (pyix, rjix), f)fi(x)\ rather than (p{x) due to the boundary 
perturbation. Making use of (pyy - -0f v, (B.le i can be simplified to 



ill = (0v - ri.4x) - {il^x)'. 



(B.2) 



where a prime indicates an x-derivative along the free surface, e.g. /' := j-^f{x,rj(x),t) - + rj^fy. Equation (B.lfi 
may also be simplified, using 

{rixfpx^y + ]j(t>l - ^0^^) = ilx^x(l>y + J]x4>x'/>y + r]x<Pxyr}(f>y + r]_,(f>_,^y + J]x(l>x(f>yyj] + 'f>x4>x + (f>x<Pxyr] ' ^y(py - (/>y(pyyT] 

= (;70.,<^,)' + <^.v0' - - '?.v</'.)- (B.3) 



The equation - DF{q)*q is obtained from 
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The right-hand side must now be re-organized so we can identify q^. P is self-adjoint, so it can be transferred from 
the bracketed term to if. The underlined terms may be written ^0, where 4> is evaluated on the free surface. Green's 
identity shows that § is self-adjoint. Indeed, if and x satisfy Laplace's equation with Neumann conditions on the 
bottom boundary, then 
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(B.5) 



Thus, from (|R4|, we obtain 
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where is an auxiliary solution of Laplace's equation defined to be (fj+cpyP^) on the free surface. Finally, we substitute 
ip - <p - (pyfj and match terms to arrive at the adjoint system 



fjix, 0) = 0, ip{x, 0) = ipix, T/4), 
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The initial conditions (B.6ai are specific to the objective function ( 3.21[ ), but are easily modified to handle the alter- 
native objective function ( 3.18| l. Note that the adjoint problem has the same structure as the forward and linearized 
problems, with a Dirichlet to Neumann map appearing in the evolution equations for ij and ip. We use the boundary 
integral method described in Appendix |A] to compute Qx^ ™d employ a dense output formula to interpolate 77 and tp 



between timesteps at intermediate Runge-Kutta stages of the adjoint problem, as explained in Section 3.2 



C. Levenberg-Marquardt implementation with delayed Jacobian updates 

Since minimizing / in p.l7| i is a small-residual nonlinear least squares problem, the Levenberg-Marquardt method 
BOl is quadratically convergent. Our goal in this section is to discuss modifications of the algorithm in which re- 
computation of the Jacobian is delayed until the previously computed Jacobian ceases to be useful. By appropriately 
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adjusting the step size in the numerical continuation algorithm, it is usually only necessary to compute the Jacobian 
once per solution. Briefly, the Levenberg-Marquardt method works by minimizing the quadratic function 



/appi-ox(;5) = m + g'p + -i/Bp, g = v/(c) = f{cy(c), B = J{cfj{c) 



(C.l) 



over the trust region ||p|| < A. The true Hessian of / at c satisfies H - B - 2, riV^rj, which is small if r is small. 
The solution of this constrained quadratic minimization problem is the same as the solution of a linear least-squares 
problem with an unknown parameter A: 
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Formulating the problem this way (instead of solving (B + AI)p — -g) avoids squaring the condition number of J . 
Rather than use the MINPACK algorithm Il40l to find the Lagrange multiplier A, we compute the (thin) SVD of J, and 
define 



J ^USV , 5 = diag{cr), p^V p 
Here U ismxn and S - is n x n. This leads to an equivalent problem 
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which can be solved in 0(n) time by performing a Newton iteration on t(A), defined as 
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It is easy to show that r is an increasing, concave down function for A > (assuming S is non-singular); thus, if 
t(0) < 0, the Newton iteration starting at /l**^' - will increase monotonically to the solution of (C.4i with t(/1^'') 
increasing to zero. This Newton iteration is equivalent to 



/ = 0, /l*"* =0, po = arg miup \\S p + r\\ 
> tol 



while (Mi^ 
= + 

pi = arg mmp 

end 
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We use tol - 10"'^ in double-precision and lO"^"* in quadruple precision. It is not critical that A be computed to such 
high accuracy, but as the Newton iteration is inexpensive once the SVD of / is known, there is no reason not to iterate 
to convergence. At the end, we set p = Vp. 

We remark that it is more common to compute /I by a sequence of QR factorizations of [J; VaP^ I], as is done in 
MINPACK. However, the SVD approach is simpler, and similar in speed, since several QR factorizations have to be 
performed to compute A while only one SVD must be computed. Moreover, we can re-use J several times instead of 
re-computing it each time a step is accepted. When this is done, it pays to have factored 7 = USV^ up front. 

Delaying the computation of J requires a modified strategy for updating the trust region radius, as well as a means 
of deciding when the minimization is complete, and when to re-compute J. Our design decisions are summarized as 
follows: 

1. The algorithm terminates if / = 0, or if c is unchanged from the previous iteration (i.e. c + p equals c in 
floating point arithmetic), or if the algorithm reaches the roundoff-regime phase, and then a step is rejected or stepsJ 
reaches max_stepsJ . Here stepsJ counts accepted steps since J was last evaluated, and the roundoff .regime phase 
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begins if / < /toi or A < gtoh where the tolerances and maxstepsJ are specified by the user. If the Jacobian has 
just been computed (i.e. stepsJ - 0), we also check if ||^|| < ^,oi or \df\/f < dfiox to trigger roundojf .regime. Here 
df - /approx(c + p) - f(c) is the predicted change in / when minimizing the quadratic model /approx over the trust 
region, and dftoi is specified by the user We used 

= 10"2^ gtoi = 10"'^ maxjtepsJ = 10, ^//fi = 10"^ 

The idea of roundoff-regime is to try to improve / through a few additional residual calculations without recomput- 
ing J. 

2. Steps are accepted if p = [f(c + p) - f(c)]/df > 0; otherwise they are rejected. Note that p is the ratio of 
the actual change to the predicted change, the latter being negative. We also use p to adjust A. If p < po - 1/4, we 
replace A by \\p\\ times oq = 3/8. If p > pi = 0.85 and \\p\\ > 0.9 A, we multiply A by ffi = 1.875. Otherwise we 
leave A alone. So far this agrees with the standard trust region mechanism [40J for adjusting A, with slightly different 
parameters. What we do diff'erently is define a parameter delta Jrigger to be a prescribed fraction, namely a2 - 0.2, of 
delta -first ^rejected, the first rejected radius after (or coinciding with) an accepted step. Note that the radius is rejected 
ip < Po), not necessarily the step (p < 0). The reason to wait for an accepted step is to let the trust region shrink 
normally several times in a row if the Jacobian is freshly computed (stepsJ - 0). 

3. The Jacobian is re-computed if roundoff. regime has not occurred, and either stepsJ reaches maxjtepsJ, or 
StepsJ > and A drops below deltaJrigger, or stepsJ > and \df\/ f < dfiox- This last test avoids iterating on an old 
Jacobian if the new residual is nearly orthogonal to its columns — there is little point in continuing if /approx cannot be 
decreased significantly. The parameters a,- were chosen so that 

max(Q'y, aQQ-j ) < ai < min(a'o, ffyO-i), (C.6) 

which triggers the re-computation of J if two radii are rejected in a row, or on a reject-accept-reject-accept-reject 
sequence, assuming \\p\\ = A on each rejection. Before computing 7, if delta -first .rejected has been defined since J 
was last computed, we reset A to 

A = delta_first_rejected I a\. 

This makes up for the decreases in A that occur due to using an old Jacobian. 

4. We compute r but not 7 if a step is rejected on a freshly computed Jacobian, or if a step is accepted or rejected 
without triggering one of the conditions mentioned above for computing J. 
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